S656 Final Project


"Orphan Works" in Copyright Law
XML Source
XHTML
PDF
PDF (fop-0.95)
Original Document

Document Selection

For the final project I wanted to encode a substantially annotated legal document. My first thought was a section or sections of the U.S.C.S. or U.S.C.A. Title 17, which covers copyright law. Although the code itself is public domain, the annotations are property of LexisNexis and Westlaw. To sidestep the difficulty in obtaining permission for use, I decided to work with a Congressional Research Service (CRS) report. These are produced for Congress by CRS, an agency within the Library of Congress.

CRS reports are requested by Members of Congress and are not directly available to the public. They are considered valuable legal research sources due to their high quality. Several law schools have developed projects to make their collections of CRS documents available on the Internet. I chose a recent report on "orphan works" from one such repository: the IP Mall of the Franklin Pierce Law Center.

Choice of DTD and Transforms

One of the most interesting aspects of this project was the encoding of legal citations. Although TEI was a consideration due to the narrative-nature of these reports, I chose to create a custom DTD for flexibility in working with the complications of legal attribution. Creating my own DTD also seemed to be more of a challenge than using an existing XML application.

Apart from the required transformation to XHTML, I wanted to pick an output form that would offer a chance to learn a new skill. Our in-class overview of XSL-FO piqued my interest. I also knew I would learn more about page layout in the process of creating a style sheet for the FO document. It was a useful contrast to work with a paged and a web format. With XHTML my aim was to create a serviceable, although non-paginated, duplicate of the original. I chose to convert the footnotes to endnotes (with links and Javascript-created tooltips). While I included page breaks from the source document for the web version, when creating the XSL-FO transformation to PDF, my goal was different. Since the source was already in PDF, the purpose of this exercise was to demonstrate the feasibility of producing these reports using XML.

Problems and Challenges

Over last summer I took the XML workshop, so the creation of the DTD and marking up the report was not difficult. Creating the XSLT for XHTML was moderately so. The parts of the transform with complicated selections that took advantage of flow control caused problems. Another difficulty was converting a hard-coded table of contents to an automatically-generated version. When working with the dynamic nature of the XSL-FO PDF pages, I discovered there was a need for this feature. These were enjoyable challenges.

What wasn't enjoyable was fiddling with the CSS to get style inheritance working as I intended. Inheritance issues turned out to be a pain while working with XSL-FO as well. I still had a few things that didn't display properly, which I discovered were related to feature compliance of the FO processor. When I used a different but proprietary processor, the problems went away.

During the transfer of the project to the Cocoon server, I learned the FO processor installed uses the first release of Apache's FOP (0.20.5). The current release (0.95), and the one used by Oxygen during design, is much more compliant to the XSL-FO Recommendation. The PDF transform from the FO stylesheet created by this version is considerably better.

Learning XSL-FO in a week was the most challenging part of the whole project. I appreciated learning a new tool and feature set for working with XML. Although I'm not sure where my career will take me, I think I would enjoy the opportunity to work on a DTD or schema to implement the Bluebook system of legal citation. XML is a great fit for legal documents.