Users who have toiled over microforms to access literary works, historical documents, and old periodicals will cheer at the recent news from UMI that it will convert its vast microform collection to digital format. At the recent American Library Association Annual Conference, UMI announced its plan to create the world's largest digital archival collection of printed works by scanning the contents of its microform collection covering 500 years of information.
That collection contains hundreds of thousands of books, newspapers, periodicals, and other materials stored in three temperature-controlled vaults at the company's headquarters in Ann Arbor, Michigan. UMI is calling this massive conversion from microform to electronic format the Digital Vault Initiative.
Scanning of the 5.5 billion pages of images began in May and will continue over the course of several years. This project will be in addition to the 37 million images of contemporary information that UMI adds to its existing digital collection each year.
According to Jeff Moyer, director of the Digital Vault Initiative, UMI is using five state-of-the-art digital scanners and is working 24 hours a day with three employee shifts. For the project, quality control editors will check each page image and will be separately indexing each illustration contained in documents. The scanners are creating page images and not doing OCR of the text itself. The scanned works will be linked to full bibliographic information in MARC records, which will be fully searchable.
The first phase of the Digital Vault Initiative will focus on UMI's collection of early English literature, including nearly every English-language book published from the invention of printing in 1475 to 1700. This collection, begun in 1938 as UMI's first microfilm project, includes such works as Chaucer's The Canterbury Tales, Culpeper's The English Physician, and Shakespeare's renowned First Folio edition of 1623.
This collection comprises 9,600 individual titles with 22 million page images. UMI estimates that the collection represents about 75 percent of all books published in English during those years. A pre-release of this phase should be available by the end of this year, with the full contents of this collection available by June 1999. These literary and historical works will be searchable on the Web as a separate database initially, and eventually will be searchable in the ProQuest Direct online service.
The company will be working concurrently on scanning the archives of the top 50 periodical titles, defined by microform sales. This will include key titles like the full run of Time magazine, dating from the first issue in 1923. This should also be completed by June 1999. Next to be digitized will be full runs of important newspapers such as The New York Times, The Wall Street Journal, and the Chicago Tribune. The digitized periodicals and newspapers will be integrated into the ProQuest Direct service.
Moyer said that by the time the newspaper project was well underway, there would be enough digitized material available in the marketplace for UMI to gauge interest and response. Further choices of works for scanning would then become market-driven to meet the needs of customers. UMI has been digitizing dissertations since 1997, and will decide later, in consultation with academic institutions, whether to scan earlier works.
When asked about the possibility of searching full text, Moyer noted that UMI is currently in discussions with several academic institutions about rekeying (or using existing text-encoded material) to create ASCII text and then marking the text to produce searchable SGML files. A long-term goal is to have UMI provide a clearinghouse function, with works available both as graphic files and as SGML files.
Through the Digital Vault Initiative, library patrons for the first time will be able to log on to UMI's ProQuest Direct and search the company's entire collection—from 15th-century literature to 19th-century newspapers to the current week's business publications.
"Our goal is to become more than just purveyors of information," said Joseph Reynolds, president and CEO of UMI. "The Digital Vault Initiative will allow UMI to take the content of our enormous vault of information and place it directly into the context of an individual's research or studies via ProQuest Direct."
The depth of the collection gives users the opportunity, perhaps for the first time, to easily access entire documents within the context of the day by viewing accompanying illustrations and photographs and by reading surrounding stories and advertisements.
UMI will continue to market ProQuest Direct to university, college, public, and government libraries, and to elementary and secondary schools on a subscription basis. Current subscribers to UMI microforms will pay only an incremental fee to access the new collections electronically. Nonsubscribers will pay content and access charges.
Web access to material formerly locked into the user-unfriendly microformat will be a momentous occasion for users. Librarians seeing the live demo at ALA were excited at the prospect of expanded resources. Kudos to the management of UMI and to UMI's parent, the Bell & Howell Company, for standing behind this important initiative.
(Editor's note: UMI has posted information about the Digital Vault Initiative on its Web site at http://www.umi.com/hp/Features/DVault/.)