The U.S. National Endowment for the Humanities (NEH; www.neh.gov) and the Library of Congress (LC; www.loc.gov) have inaugurated a new online service for America's historic newspapers called Chronicling America (www.loc.gov/chroniclingamerica). The core of the service is a directory of all newspapers published in the U.S. dating from 1690 and incorporating library holding records from OCLC. The directory content stems from a 2-decade-plus project supported by the NEH called the National Newspaper Program (NNP). A selection of digitized newspapers from the NEH's new National Digital Newspaper Program (NDNP) carries articles published from 1900 to 1910 in newspapers from six states and the District of Columbia. Future plans for the growth of the service should result in public access to 30 million digitized pages, but completion of the project is expected to take up to 20 years. At this time, there are no plans to open up the content to Web search engines such as Google and its News Archive or the Open Content Alliance.
In 1982, the NEH began funding the NNP through grants to a network of 64 institutions in all 50 states, plus the District of Columbia, Puerto Rico, and the U.S. Virgin Islands and eight national repositories, including the Library of Congress itself. Generally, the NNP grantee constituted the institution already serving as the largest repository for newspapers in the state. NNP project staff have worked to identify and catalog all newspapers ever published in each state and to inventory holdings in public libraries, county courthouses, newspaper offices, historical museums, college and university libraries, archives, and historical societies. Catalog records are maintained as a national database by OCLC, matching entries to its own holding records. NNP grantees have also microfilmed newspaper holdings for preservation. As of 2008, the last state engaged in NNP will have completed its work and the NNP will end.
In March 2005, the NEH began implementing the NDNP, a digital sequel to the NNP, as part of We the People and Digital Humanities initiatives to employ digital technology to encourage the teaching and study of American history and culture. This project will digitize content of historically significant newspapers published between 1836 and 1922. Helen Aguera, senior program officer at NEH and coordinator of NDNP, explained that, as they came to the end of the National Newspaper Program, the NEH "saw we needed more access to newspapers already preserved. So we looked to the possibilities of digitizing and how to repurpose information from the other program." At this point, Aguera said, "We have about 70 million pages of microfilm. We do not anticipate digitizing that many pages, but we have thought of digitizing 20-30 million pages over a 20-year period."
NDNP will follow the same structure as NNP with one organization in each state or territory awarded a contract—most probably, the same institution that conducted the NNP program. For example, among the first six recipients of NDNP grants, only one differed from the NNP grantee. Though the NNP's bibliographic directory covers newspapers published from 1690 to the present, full-text, full-image articles from the NDNP's digitization efforts are restricted to 1836 through 1922. Tom Lindsay, NEH deputy chairman, stated that the six 2-year awards given since 2005 totaled $1.9 million, but it anticipated another 8-10 this year, totaling around $3 million to $3.5 million. Mark Sweeney, chief of the Serial and Government Publication division of the Library of Congress, told me that, despite recent announcements of budget cutbacks for the National Digital Information Infrastructure and Preservation Program (NDIIPP), this would not affect the Chronicling America service.
The Chronicling America: Historic American Newspapers service contains not only the 138,000-plus bibliographic directory generated by the NNP, but also more than 226,000 pages of full-text, full-image content from the first phase of the NDNP. The searchable, digitized images from this first phase come from 23 newspapers published in California, Florida, Kentucky, New York, Utah, Virginia, and the District of Columbia. With LC leading off in contributions (more than 90,000 pages), it is no surprise that 13 of the 23 titles cover the Washington, D.C., area.
There are two search and browse boxes available on the home page for the service—one leads to views of the selected full-text, full-image newspapers; the other to content from the Newspaper Union List directory of 138,000-plus newspaper titles and 900,000 separate library holdings records. Users can search the text of specific newspapers and limit searches to specific states, newspapers, and years or months of publication. The FAQs for the service warn about the pitfalls of misspellings when OCR (optical character recognition) is applied to content—particularly old, blurry content. When it comes to misspellings in the Newspaper Directory, however, LC asks users to notify a CONSER member (www.loc.gov/acq/conser/conmembs.html) with corrections.
After finding an article of interest, users can display them online using Adobe Flash 8.0 or above, as high-resolution images (JPEG2000), or as PDF image files, as well as text-only Adobe downloadable pages. According to Sweeney, whatever you find is "so old it's now in the public domain. There are no rights restrictions that we know of. You can make copies and re-purpose them." However, a legal disclaimer at www.loc.gov/chroniclingamerica/about.html warns, under a Rights and Reproductions section, that "responsibility for making an independent legal assessment of an item and securing any necessary permissions ultimately rests with the persons desiring to use the item."
For directory searches, here are a few quick searching tips. Capitalize any Boolean operator. Enter phrases in their most likely order and with quotation marks around them. Though the system will retrieve all the words in the phrase, a better order will affect display in the relevant ranked results. Sweeney provided another tip: To find newspaper Web site listings, try using "http" as a search term. By the way, when I checked, that reached 560 entries with some dead ends. For example, a listing for a specific closed date span of The Journal of Commerce led to a URL that no longer worked, though cutting back to the homiest portion of the URL (http://www.joc.com) would take you to the still very active service. By the way, every time you click on one of those "Web site" links, you go to a rather intimidating disclaimer from LC's legal department (www.loc.gov/global/disclaimer.php?url=http://www.neh.gov/projects/ndnp.html). Ignore it. Just click on the "external link" option and sail right though.
One more caution for both full-text and directory searches: Times change and so does terminology. At the dawn of the 20th century, horseless carriages outnumbered automobiles—taxonomically speaking. Location names may have also changed. For more tips and background, check out the Help section (www.loc.gov/chroniclingamerica/help.html).
Bruce Cole, chair of the National Endowment for the Humanities, and James Billington, Librarian of Congress, both strongly support the use of digital technologies to deliver information to all Americans. Billington testified before Congress this month that "the Library's basic mission of acquiring, preserving, and making accessible the world's knowledge and the nation's creativity is not changing. … We are proud that the Library is yielding profoundly valuable information and educational resources for the nation. We are bringing together both the historical digitized materials and the born-digital content that together provide a strategic and unique resource for the nation."
The NDNP and Chronicling America products exemplify this goal. In a 2004 article appearing in the OAH Newsletter of the Organization of American Historians (OAH), Cole saluted the NDNP, saying: "The Endowment's first newspaper program, the USNP, ensured (in a predigital era) that this widely scattered and highly vulnerable corpus was organized and then cataloged and preserved on microfilm to consistent national standards. Now the NDNP will complete the process of making these materials fully accessible, by digitizing microfilmed newspaper titles from every state so that they will be available for use in academic offices, classrooms and homes across the nation."
But how will the word get out about this new service as it grows over the years? (Beyond, of course, the vast readership of InfoToday NewsBreaks.) I asked Sweeney whether they planned to open the service up to Web search engines such as Google, Yahoo!, MSN.com, Ask.com, etc. And specifically, I asked if they had any plans to reach out to Google News Archive. (For background on the Google News Archive, check out "Traditional Information Industry Opens Premium Content to Google News Archive," http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=18226 and "Who? What? How Much?: Google News Archive Premium Content Suppliers" http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=18227.) Sweeney said that the content was "not open for harvesting. We are building the program incrementally. It is only accessible to users through the user interface." He added that the institutions contributing content to the program "can keep and use it for their own purposes. For example, the University of California could repurpose its content." When asked about promotional plans, an NEH representative indicated that there were plans to demonstrate the new service. The chairman would discuss it in local and regional travels. The Library of Congress and the NDNP partners in different states would promote it too.
None of this seems quite as promising in harvesting the eyeballs available through the major Web search engines. However, NEH's Lindsay indicated that they would be open to other sources of digitization support in the future. As to Google or the Open Content Alliance specifically, Lindsay said that it was "very tentative now, but we do welcome contributions. Public-private collaboration is something we are pursuing in the NEH agenda."