More large-scale digitization projects continue to emerge as aggregators move to extend their digital archives. The National Archive Publishing Co. (NAPC; www.napubco.com) has announced a 2-year project by which they will digitize a backfile of microfiche reports in ERIC (Education Resources Information Center; www.eric.ed.gov). All documents date from 1966 to 1992óabout 340,000 documents or 40 million pages. Due to a conservative interpretation of contract language used until 1993 for submitting documents to ERIC, the project will also involve chasing down copyright holders, both corporate and individual authors, for permission to offer access to the electronic documents. Though the digitization will proceed independent of the permission-seeking process, the availability of full-text PDF files of the documents (free at the ERIC Web site) will depend on securing permission.
ERIC is one of the most venerable online database producers (File 1 on Dialog). Begun in 1966, the service was designed around a network of clearinghouses located at universities and research institutes scattered across the nation. ERIC's bibliographic database and print abstract services cover a wide range of material: journal articles, books, research syntheses, conference papers, technical reports, policy papers, and other education-related materials. For the gray or report literature, it has a document-delivery service supplying microfiche copies (the ED series).
In 2004, the U.S. Department of Education restructured the service into a centralized operation under a 5-year, $34.6-million contract with the Computer Sciences Corp. (CSC). Later in 2004, ERIC began making full-text documents issued from 1993 to 2004 available free on its Web site. It currently has more than 100,000 PDF documents available. Though the ERIC system does supply full-text documents, it does not support full-text searching of those files. Searchers still use the bibliographic databaseónow up to more than 1.2 million bibliographic recordsóto identify what they want. Access routes to other content published after 2003 includes links to publisher Web sites among other sources. (For more background on ERIC, read Paula Hane's NewsBreak, "Update on ERIC," Oct. 1, 2005, http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=18029.)
NAPC's Content Solutions business unit, launched a year ago, will handle the project. In making the announcement, Jeff Moyer, executive vice president of publishing at NAPC, stated: "Our mission is to improve access to information by creating digital files with the appropriate level of metadata to suit the application. This project is the first to bring together our expertise in digitization and rights clearance at such a large scale. We believe this rare combination is valuable for building dynamic information archives." As to the size of the project, Moyer remarked on the sight of a semi delivering the fiche cabinets at their Ann Arbor, Mich., loading dock: "That's a lot of fiche."
The hardest part of the project may turn out to be getting permission from copyright holders. Lawrence Henry, CSC program manager for ERIC, explained that earlier-era release forms specified permission to convert, archive, and distribute microfiche and microfilm documents. Under instructions from the Department of Education, CSC, according to Henry, is being "very conservative, to make sure to clear every single copyright or we won't release" the document. Both Henry and Moyer indicated that their first focus in clearing copyright would involve acquiring global approvals from institutional publishers, e.g., contractors who hired researchers. Henry noted that this focus would explain the forthcoming "difference in the number of copyrights cleared versus the number of digitized documents added." Release forms after 1992 included digital rights.
On the ERIC site, a notice describing the microfiche digitization project (www.eric.ed.gov/ERICWebPortal/Home.portal?_nfpb=true&_pageLabel=Digitization) requests copyright holders who have contributed documents in the past and who want to grant ERIC the right to disseminate their documents to complete a contact form. It even asks people who know of colleagues who contributed documents in the 1966-1992 framework to "please pass the word."
ERIC also actively solicits document submissions from authors and publishers and works with other government agencies to add education-relevant documents (www.eric.ed.gov/ERICWebPortal/resources/html/publishers/publisher.html). In the case of journal literature, ERIC not only builds links to publisher Web sites, it also, according to Henry, is "working with publishers to archive copies of their full text and to seek permission for embargoing content for release whenever their business suggests the content has diminished economic value." He added, "Our focus at ERIC is to try to provide access to full text that meets the needs of ERIC users, but doesn't encroach on the rights of the publishers."
The main burden of chasing down copyright holders is on NAPC. Moyers explained that the starting point would consist of "the information in the ERIC record. This also includes a resume document included with the microfiched original document. Additional sourcing for authors includes directories and Web searches and even Google searches." For multiple-authored pieces, "we will accept a single acceptance from one author if that author accepts on behalf of the others. Our assumption is that they contributed once and the majority will welcome having their content accessible."
What happens if the author died? We're talking decades of documents. "We'll follow the estates of the dead," Moyer said gamely. He also said that "when it comes to the portion that we can't locate, the Department of Education is still interested in understanding the process." Many information industry professionals and content holders from Copyright Clearance Center to The Authors Guild and the Association of American Publishers might be interested. (I told him I would expect an article for Searcher in 2 year's time.)
Prospective legislation currently before Congress would seem to have potential future impact on ERIC's situation. Henry admitted that an open access policy for federally funded research would supply a "great conduit for educationally focused material." Documents for which NAPC cannot find the copyright holders could fall into the "orphan works" category currently being considered in legislation to relax copyright restrictions. Henry said that the Department of Education "has no orphan work clause in its copyright processing guidelines at this point. It is all at the Department's discretion." However, digitized copies of all the 1996-1992 documents will be sent by NAPC in DVD format to the National Library of Education (NLE) at the Department of Education. If policies or legal conditions change, the department would have the documents ready to go.
Henry confirmed that ERIC only offered full-text document delivery, not full-text searching "at this time." Nor did they have any immediate plans for OCRing (the optical character recognition that would enable full-text searching). However, Moyer said that the digitized copies on their way to the NLE included "archival copies of compressed images of each page." Libraries participating in the Google Book Search project receive individual page image files accompanied by individual OCR'd text files. Step 1??