Many libraries—academic, public, and private—and other “memory organizations,” such as museums, historical societies, commercial publishers, and the like, have started taking responsibility for the digital archiving of newspapers and born-digital news sources.
Commercial vendors such as ProQuest have also taken leadership roles in this area. But what needs doing still needs more doers. The challenges are extreme, but losing any part of the record of human history shames the information professional community.
Acting under a grant titled the Chronicles in Preservation from the National Endowment for the Humanities (NEH), Educopia Institute and its hosted project, the MetaArchive Cooperative, recently issued the Guidelines for Digital Newspaper Preservation Readiness. The draft was released for comments on July 22, 2013, and the public review period will extend to Sept. 20.
A Commitment to Digital Archiving
The document focuses primarily on the technical issues of a commitment to digital preservation, though recognizing the difficulties of the legal and bureaucratic challenges. It wisely assumes a dual audience for its advice: large institutions with strong resources and existing projects in the area and small institutions with limited resources and experience. The goal of the guidelines is to identify the activities, expertise, and knowledge needed for readiness at both a basic and an optimal level. The tacit assumption behind the document seems to be that information professionals need to commit to digital archiving.
The Chronicles in Preservation project, led by the Educopia Institute, has support from many libraries, including those of the University of North Texas, Penn State, Virginia Tech, University of Utah, Georgia Tech, Boston College, and Clemson University, as well as the San Diego Supercomputer Center. The guidelines document is the first major deliverable coming forth from this 3-year project (2011–2014) that the NEH has funded to research and document a series of preservation readiness steps for digital newspaper curators. Katherine Skinner, Ph.D., MetaArchive program director, and executive director of the Educopia Institute, says comments would be recorded to improve the document for its final edited version.
“We don’t want just university and public libraries,” says Skinner. “We need other groups, including vendors, so that libraries can maintain products that vendors can use. We also want to expand outside the United States with international groups.”
The other two deliverables scheduled for the project are “a set of curation tools mainly developed on other projects but designed specifically to help with newspapers and carrying improved documentation,” according to Skinner. “The third document will evaluate different types of depository infrastructures and the pathways for the different types. It will be a guide. Work on both documents is underway as referenced in the Guidelines.” Readers can check the comments now being received, section by section.
The guidelines document emphasizes the standards in use for digital preservation, though Skinner stressed how many preservation projects were legacy systems with limited use of the best standards. She considers increasing the use of the standards available to be very important, but she admitted that the best of the standards might involve expensive conversions. Clearly, people just getting into the field should set their course correctly before starting.
The guidelines divide the core issues into six areas:
- Format management
- Metadata packaging
- Checksum management
- Organizing the digital newspapers
Within each section covering these issues, advice is organized into the following categories:
- Background (rationale and sound practices)
- Tools (software, standards, services, etc.)
- The readiness spectrum (different approaches for different organization types)
- Essential readiness (minimalist approach)
- Optimal readiness (full speed)
Supplementary content includes case studies and a Roadmap Checklist. The last section, Additional Considerations, touches on some of those difficult challenges that are not ironed out yet, including copyright, ownership controversies, partners and permissions, change management, and monitoring. It also distinguishes preservation for distribution versus backup.
The Educopia Institute currently has three main programs: Educopia Consulting, the Library Publishing Coalition, and the MetaArchive Cooperative. Educopia Consulting helps academic, research, and memory institutions plan, implement, and assess their growing digital infrastructures and collections in the specialty areas such as digital preservation and curation, digital scholarship (particularly the humanities), and collaborative network-building.
The Library Publishing Coalition is a 2-year project (2013 through 2014) with more than 50 academic libraries to engage library publishing practitioners in designing and building a collaborative network to address and support an evolving, distributed, and diverse range of library production and publishing practices.
The MetaArchive Cooperative, which produced the current guidelines, is a distributed digital preservation network launched in 2002 when six libraries in the southeastern U.S. banded together to develop a digital preservation solution for their special collections materials. The outcome of that collaboration was MetaArchive, a community-owned, community-led initiative comprising libraries, archives, and other digital memory organizations, working cooperatively with the Library of Congress (LC) through the NDIIPP (National Digital Information Infrastructure and Preservation Program). Educopia was founded in 2006 to host MetaArchive and other collaborative programs.
“Over the last five to six years, our members expressed their worries,” says Skinner, when explaining the origin of the project. “We didn’t feel even the good standards, especially those from NEH and the LC program were being applied. Educopia strives to promote multi-partnerships. Most libraries these days are acquiring born-digital news and many have no standards for file formats or metadata.”
Skinner clearly expresses the missionary view that libraries and other “memory organizations” must act, but she admits that few organizations seem to have the preservation mandate. As for the challenges that could end up in court—usually challenges that arise from and around access more than archiving—she admits they continue to inhibit the process.
“We’ve been working for a decade and still have only around 60 members,” says Skinner. “Why so few?” To get started, she advocates that those interested in preservation start creating “dark archives” (i.e., systems that don’t connect to access for now). But she still demands that information professionals step up, especially as the born-digital news content from individual items on newspaper websites to news blogs and social networks starts vanishing.