HathiTrust: A Digital Repository for Libraries, by Libraries
Posted On October 23, 2008
For those who thought Google Book Search (http://books.google.com) and its Library Partners program represented the death knell for libraries, take a deep breath and check out the latest digital elephant in the room, HathiTrust (www.hathitrust.org). HathiTrust is a shared digital repository aimed at bringing the vast collections of print books and journals currently housed in libraries into the digital world for the purposes of access, discovery, and preservation. The project began as a partnership of the 13 university libraries of the Committee on Institutional Cooperation (CIC; www.cic.net), the 10 libraries of the University of California system (www.universityofcalifornia.edu/cultural/libraries.html), and the California Digital Library (CDL; www.cdlib.org). The University of Virginia Library (www.lib.virginia.edu) also officially joined the partnership on Oct. 13, 2008, the same day the repository itself was announced.
If the connection between these libraries and large-scale book digitization seems familiar, it’s because they are all members of the aforementioned Google Book Search Library Partners program. While it might seem like the HathiTrust repository would be in direct competition with projects like Google Book Search and the Open Content Alliance (OCA; www.opencontentalliance.org), HathiTrust’s leadership would disagree. Laine Farley, interim executive director of the CDL, sees the projects as complementary to each other and that the HathiTrust can fill a special academic niche. "We have become convinced that there are some approaches to using this content, from an academic standpoint, that Google may not address."
One of the areas in which the projects diverge is the importance placed on long-term preservation. John Wilkin, University of Michigan’s associate university librarian and the executive director of HathiTrust, explained: "[Long-term preservation] is something we feel libraries need, and I think it has been one of the concerns about Google as a digitization partner. These resources need to be, in the long term, managed by libraries. This is something Google understood from the beginning in their partnership with us."
With the high profiles of Google Book Search and the OCA, it is easy to miss the fact that many of the HathiTrust partners themselves have been developing the technological infrastructure necessary for creating large-scale searching, rights management, and backup capabilities for this amount of information. When it comes to developing the architecture for digital collections, Farley recognizes that "Michigan has been doing that as long as anybody else" particularly in the area of rights management, which represents a significant challenge for any digitization project. When asked about the rights management system behind HathiTrust, Wilkin described a complex database-driven system that automatically assigns a rights status (in-copyright, public domain, etc.) based on the metadata for the item (place of publication, date of publication, etc.). Wilkin also noted that the rights management system allows for manual overrides of the automated assertions to allow for a wide variety of instances, including occasions where rightsholders allow open access to their work.
Speaking of being open, one of the stated goals on the HathiTrust website is an open technical framework that invited development from outside the central organization (www.hathitrust.org/mission_goals). "We don’t want [HathiTrust] to be a service agency outside of libraries providing a service to libraries. We want to create a service agency of the libraries, who are owners and members, so that we can collectively define the architecture," Wilkin explained. Farley points out that this kind of open framework can also make a project nimble enough to deal with the unexpected. "We recognize that you can’t anticipate what anyone might want to do with the technology, so the more open you can make it, the better off everybody is in letting creativity come about." Farley also noted that the open framework will encourage partner institutions to develop side projects that serve their interests as well as draw upon their institutional expertise. To fuel this development, the HathiTrust partners announced in their September 2008 update that they are creating a sandbox for shared development (www.hathitrust.org/updates_september2008). The partners have already developed one API that allows local library catalogs to get URLs and rights information from the HathiTrust site (www.hathitrust.org/bibliographic_data_distribution).
The current partners have funded the HathiTrust for "an initial 5-year period beginning January 2008, with a planned process of review and renewal" (www.hathitrust.org/governance). Future partners "will be charged a one-time start-up fee based on the number of volumes added to the repository, in addition to an annual fee for the curation of those volumes," according to the HathiTrust FAQ (www.hathitrust.org/faq). With the very recent addition of the University of Virginia, it would not be surprising to see other libraries with strong existing digitization programs joining the partnership as well. "I would hope that the library community would see this as opportunity for academic libraries to link arms and do something powerful together," Farley noted. "When we can align our interests and put our resources together, not just the content but our expertise, and make something bigger, I think it benefits everybody."
Wilkin also sees this project as a situation where the whole is greater than the sum of its parts: "It has been great to see the institutions coming together around the possibility of doing something in a deeply collaborative way. We talk a lot about interoperability to get things done but that assumes everybody doing something on their own. I think there has been a lot of enthusiasm and interest about participating in HathiTrust because of the way that we can leverage all of our capabilities."
This spirit of collaboration does not mean ignoring one’s previous partners either. "I hope that people don’t perceive this as our abandoning other things like the Open Content Alliance. We are still very much a part of that project," Farley points out. "Anything that we have done is non-exclusive. We don’t see only one way that this content could be used. We have to try out these different models and see where they take us."
A public discovery system beta is in the works. Wilkin commented that open source tools, such as Villanova University’s VuFind (www.vufind.org), are currently being examined as possibilities for the public interface. We can expect to see this public interface and a full-text search of public domain materials in early 2009. Until then, curious users can access and examine records for all of the HathiTrust content in the University of Michigan’s online catalog, Mirlyn (http://mirlyn.lib.umich.edu), or look for public domain materials in the University of Chicago’s discovery system, Lens (http://lens.lib.uchicago.edu). Library guru Roy Tennant has also gotten in on the act by creating the HathiTrust Search prototype (http://roytennant.com/proto/hathi) with the HathiTrust metadata, which is available for download at the HathiTrust website (www.hathitrust.org/bibliographic_data_distribution).
HathiTrust: Who’s On Board?
- California Digital Library
- Indiana University
- Michigan State University
- Northwestern University
- The Ohio State University
- Penn State University
- Purdue University
- University of California–Berkeley
- University of California–Davis
- University of California–Irvine
- University of California–Los Angeles
- University of California–Merced
- University of California–Riverside
- University of California–San Diego
- University of California–San Francisco
- University of California–Santa Barbara
- University of California–Santa Cruz
- The University of Chicago
- University of Illinois
- University of Illinois–Chicago
- The University of Iowa
- University of Michigan
- University of Minnesota
- University of Wisconsin–Madison
- University of Virginia