HathiTrust recently launched Zephir, its new system for managing bibliographic metadata. The University of California’s CDL (California Digital Library), one of the founding institutions of HathiTrust, developed Zephir to store, manage, and export metadata to bibliographic records that accompany digital items in HathiTrust’s repository of about 11 million volumes.
All digital materials submitted to HathiTrust’s repository must be accompanied by bibliographic metadata. Previously, the University of Michigan had managed the metadata in its local ILS. Now, institutions contributing to HathiTrust send the CDL their metadata. Zephir is HathiTrust’s “first distributed development of a major repository component outside the University of Michigan,” according to the press release.
How It Works
University of Michigan and CDL staff members worked together for several years to ensure a smooth transition for bibliographic processing. With help from both the CDL and the University of Michigan, Zephir operates as follows:
- HathiTrust’s partners submit their MARC 21-formatted bibliographic records to the CDL for inclusion in HathiTrust’s repository. The records include title, author, publisher, subject heading, and other information usually found in library records.
- The CDL processes the records in Zephir, which stores all versions of records submitted and selects the most complete one for use in HathiTrust. (According to Zephir’s website, the system “uses a scoring algorithm to weight the presence or absence of MARC fields and field values to determine base record selection.”)
- The CDL exports the records that will be used in HathiTrust’s public access catalog to the University of Michigan.
- The University of Michigan loads the records into HathiTrust’s catalog, data feeds, and APIs.
“It’s taken a lot to … work through the details and how to do something in a distributed way like that, but California’s done a great job,” says Jeremy York, assistant director of HathiTrust. “We did extensive testing beforehand to be sure that everything would work properly and it has.”
Fans of Babar the elephant will recognize the name Zephir: The CDL chose it because “hathi” is the Hindi word for elephant, and Zephir the monkey is Babar’s friend in his eponymous book series.
The Zephir system can certainly be considered a friend of HathiTrust. A few years ago, the CDL was “looking for ways that they could contribute to HathiTrust, that they could deepen the relationship,” says York. The University of Michigan had identified bibliographic management as a process that could be distributed to other institutions, he says. The CDL manages a system similar to Zephir for its parent library system at the University of California: “[T]hey’ve been doing this for a long time and they said they’d like to build a new system” for HathiTrust, says York. “So it was seen as a way to broaden involvement in HathiTrust by a partner in a very significant way.”
“Bibliographic metadata is critical to users’ ability to find and use materials, but managing this metadata is challenging. We knew we could make a strong contribution that would enhance the user experience and maximize the potential for use and enhancement of HathiTrust bibliographic records,” says Laine Farley, the CDL’s executive director.
Zephir demonstrates to other partner institutions that collaboration on such a large scale is possible, says York. “This is definitely a special case in terms of scale. … [T]he model that we have is geared toward fulfilling the needs of the partner institutions. So if a partner has something it would like to do, and we can leverage it at scale for many partners, then it’s something that we pick up,” he notes. HathiTrust currently has 89 members from the U.S., Canada, and Spain.
Spirit of Collaboration
The HathiTrust model encourages partner institutions to work on projects that are important to them. York describes a few examples of collaborative projects similar to Zephir that were designed by partners to benefit the HathiTrust community.
IMLS (Institute of Museum and Library Services) awarded the University of Michigan a grant “to do manual copyright review of volumes that … may be in the public domain because they didn’t comply with certain copyright formalities,” he says. The Copyright Review Management System had an impact on hundreds of thousands of volumes in HathiTrust “that were presumed to be in copyright [and] have been opened in this way because they didn’t have a copyright notice or the copyright was not renewed. And that was something that many partners were interested in participating in because it benefits everyone in HathiTrust.”
The CDL wanted to preserve its content digitized in the Internet Archive in HathiTrust, so it developed the specifications needed to make that happen. “They worked with staff at Michigan to figure out the workflows and so forth and then we were able to take those specifications and that workflow and apply it to any institution that wanted to contribute Internet Archive digitized content,” says York.
The HathiTrust Research Center was a project co-developed by Indiana University and the University of Illinois to create software tools and cyberinfrastructure that help researchers access HathiTrust’s massive amounts of digital text. “[T]hey felt so strongly about this … that they built the infrastructure for the research center and they are managing it. So that’s something they wanted to do, they felt they could do it, and they’ve been working on that now for a couple of years,” says York.
Just as with these projects, Zephir “embodies the deep collaboration that is at the heart of HathiTrust,” says Brian Schottlaender, chair of HathiTrust’s board of governors. “It is a tremendous example of the ways we are able to leverage the expertise of a broad range of institutions to achieve a whole that is greater than the sum of its parts.”