IMLS (Institute of Museum and Library Services) announced recently that it is funding a $2 million grant for three partners (Digital Public Library of America, Stanford University, and DuraSpace) to develop repository solutions, hosting strategies, and methods for the exchange of information for digital library collections. The project will give libraries, museums, and archives of all types and sizes an out-of-the-box open source solution for managing their digital content and exposing what they have to the world. It will build on the work many academic libraries have done to create institutional repositories for their own organizations.
DuraSpace is providing the infrastructure for the project. “It aligns perfectly with our mission to steward the scholarly and cultural heritage records and make them accessible for current and future generations,” says Debra Hanken Kurtz, DuraSpace’s CEO. Stanford University will develop core components for the project, and the Digital Public Library of America (DPLA) will focus on the connective tissue between repositories. The entire project is nicknamed “Hydra-in-a-Box” because it builds on Hydra, an existing effort in repository development.
A Many-Headed Solution
Hydra is a collaboration that has developed a repository solution currently being used by institutions worldwide to manage and share digital content. In a 2013 interview, Tom Cramer, Stanford University Libraries’ chief technology strategist and associate director of digital library systems and services, said, “Hydra’s goals are to combine the power of a repository for enterprise-scale digital asset management and preservation, with tailored interfaces.” He went on to describe those interfaces as including “workflows and access systems specific to different content types and streams—e.g., articles vs. images vs. time-based media vs. books vs. data.”
The principal platforms in Hydra are the Fedora repository software, Solr, Ruby on Rails (a development framework designed to make programming easier), and Blacklight (an open source code base for discovery services). The project’s name derives from the many-headed serpent in Greek mythology because the software is able to support multiple applications and workflows (i.e., heads). Institutions currently deploying Hydra (and its components) often have developers with expertise in each of the underlying platforms. Hydra-in-a-Box will allow many institutions to use those solutions and exchange information without similar investments in development.
The effort is expected to take less than 3 years, and the partners are planning engagements with libraries, museums, and archives nationwide to identify their requirements for a robust-but-turnkey solution to managing digital collections. Those implementations will offer institutions of all types a way to easily integrate their collections into discovery solutions and digital libraries such as the DPLA. With this federal government investment in solving such a fundamental challenge for libraries, museums, and archives, hopefully federal agencies will adopt the solution as well.
Depositing, Managing, and Delivering Content
Many federal agencies have substantial collections of digital content on the internet, but not many are managed using repository solutions. Large systems such as PubMed Central (PMC) and the U.S. Government Publishing Office’s Federal Digital System (FDsys) have substantial holdings and extensive metadata. Interchange with other systems, however, is not optimized for services such as the DPLA.
Repository solutions are important for federal agencies because they are designed to manage the life cycle of digital content, not just for hosting individual files that can be found and downloaded. Solutions such as Hydra are designed using the Open Archives Initiative (OAI) model, which breaks the handling of digital content into three distinct areas: deposit, management, and delivery. By separating these three areas, libraries can accept a variety of content types, add the additional information necessary to manage the content, and produce multiple formats for outputs to address the needs of different users. Digital repositories that don’t manage their collections using a methodology similar to OAI’s cannot support the types of content use their customers may need in the future.
Hydra is specifically designed to use rules-driven workflows that will make updating information across a system easy. It also integrates time-based media content types, so not only can it manage documents, but also audio and video files with information such as transcripts and markers at specific time stamps. Hydra supports annotation and other tools that instructors can access when using digital content in class, and it can integrate with other systems, including unified search, archival collection management, and identity and access management.
Federal Repository Systems
The adoption of repository solutions is growing in the federal government, and this development will help those systems contribute to important efforts similar to the DPLA. Some federal agencies are currently releasing plans for making public access to federally funded research available for free. Those plans vary, but they include building repository solutions within individual agencies and using an existing system such as PMC or the Centers for Disease Control and Prevention’s (CDC) CDC Stacks solution.
CDC Stacks is functional today and manages the public health publications funded by the CDC, which uses the National Institutes of Health Manuscript Submission (NIHMS) system to facilitate the submission of manuscripts to PMC and CDC Stacks. CDC Stacks is built with Fedora but does not currently use many of the components that will be available in Hydra-in-a-Box, such as Ruby on Rails and Blacklight.
The National Technical Information Service (NTIS) also offers a service for designing and hosting repository systems for federal agencies, based on Fedora. NOAA (National Oceanic and Atmospheric Administration) has used NTIS to establish a repository for documents relating to the Deepwater Horizon oil spill. NOAA reviewed the functionality of CDC Stacks and selected that system because it met its requirements for making publicly funded research readily available. CDC will be providing the Stacks solution to NOAA under an interagency agreement.
The National Library of Medicine (NLM), the world’s largest medical library, has a digital repository that complements PMC and allows search, browsing, and retrieval of monographs and films from NLM’s History of Medicine Division. “The new Digital Collections repository will allow NLM to provide permanent, robust access to an even broader range of biomedical information,” according to Betsy Humphreys, acting director of NLM. This first release of Digital Collections included a newly expanded set of monographs on cholera. NLM’s solution was built using several open source components, with Fedora providing the foundation. The browse and search interface was adapted from the Muradora front end for Fedora. The book viewer is a component of Northwestern University’s Book Workflow Interface, and the video player was adapted from a research project by NLM’s Office of Computer and Communications Systems.
There has been a community of Fedora users in the Washington, D.C., area for several years. The participants include a previous director of the Fedora Project, Thorny Staples, who is currently working with the Smithsonian. Many other federal agencies will be exploring repository solutions in the years to come, and Hydra-in-a-Box should be a welcome alternative to the development that has been required to deliver the variety of solutions that are currently in use across the federal government.