HathiTrust, a massive digital repository for books and serials digitized from major research libraries, will open its collection to the popular Summon discovery service from ProQuest’s Serials Solutions. In 2010, HathiTrust introduced its full-text searching system for its more than 8.4 million items, including both public domain and in-copyright material. Sometime this summer, Summon will begin offering full-text access through its discovery service with content matched to individual library holdings. At the same time, the 2.25 million public domain items in the HathiTrust digital library will become reachable through Summon for access and/or downloading. In effect, the currently 200+ Summon libraries will have added both full-text access to large portions of their existing print collections and significantly increased their ebook collections with millions of public domain items. Later this summer, HathiTrust plans to enhance its own full-text searching with new features and, at the same time, possibly offer access to other discovery services.
The HathiTrust digital library collection began as a byproduct, in a way, of the Google Books project. Under its arrangements with Google, initial participating libraries received digital copies of the books they had allowed Google to digitize. Some of the libraries, particularly the University of Michigan, had allowed Google to digitize in-copyright content as well as public domain items. HathiTrust merges the digital contributions of its members, many of whom overlap with Google Books’ members. HathiTrust’s 52 members include the Committee on Institutional Cooperation (CIC) and libraries at the University of Michigan, Indiana University, Columbia, Princeton, Yale, Harvard, Duke, Johns Hopkins, Purdue, Stanford, and the University of California.
According to John Wilkin, executive director of HathiTrust and associate university librarian at the University of Michigan, the bulk of the HathiTrust collection still comes from Google Books, a source that continues to grow as Google keeps digitizing from HathiTrust’s library participants. Wilkin described the non-Google Books portion as including several hundred thousand items, including digitization from member libraries with the assistance of the Internet Archive and some contributions from university research presses. As to accessing in-copyright content, that, according to Wilkin, would be limited to legally authorized (“Section 108”) uses, such as Oliver Twist orphans (“damaged, deteriorated, lost or stolen and unavailable at any reasonable price”) or copies to help people with disabilities in handling print copies.
By the way, Wilkin explained the origin of the HathiTrust’s name. “Hathi is the Hindi word for elephant, like Colonel Hathi in Kipling’s Jungle Book. The elephant is a symbol of memory and strength and wisdom and longevity.” The two primary goals of HathiTrust are preservation (not limited to digital) and access, which Wilkin describes as necessary companions. “We believe the library community needs strong platforms that make the discovery of quality content in libraries’ collections as easy and compelling as commercial web alternatives. In making the HathiTrust searchable from the Summon discovery service, we are enabling users to easily and efficiently search the full text of the entire HathiTrust collection concurrent with their exploration of a library’s other collections. We see this significant step forward in discoverability aiding HathiTrust in ensuring the accessibility and long-term preservation of this vast record of cultural heritage and collected knowledge.”
John Law, vice president of discovery services at Serials Solutions, described their goals and their role. “HathiTrust has two key missions—preservation and access. Working together, we can accelerate access for HathiTrust, because Summon is the front door for so many libraries and our adoption rate is ramping up quickly. With HathiTrust included in that central search, it should increase their access immensely. Libraries can control what they include. One of the exciting aspects for libraries and users is that the library can set everything in HathiTrust as the default search. Or they can designate all the books on their own library shelves that are already digitized and included in HathiTrust. If the content is out of copyright, then users can click through to the HathiTrust collection.”
The more than 8.4 million items in the HathiTrust digital library include more than 4.6 million book titles and more than 200,000 journal titles, plus many government documents, totaling nearly 3 billion pages. Wilkin describes ambitious plans under way to expand HathiTrust’s collection and services. They hope to bring several publishers’ lists into open access and are building a journal publishing service that should be out later this year. Recently HathiTrust was certified as a trustworthy digital repository by the Center for Research Libraries (http://www.crl.org) under its prestigious and rigorous Trustworthy Repositories Audit and Certification (TRAC) assessment program. (http://www.hathitrust.org/trac)
Wilkin clearly believes in universal digital library service. He mentioned the participation of HathiTrust in the emerging Digital Public Library of America, while admitting that, at this point, the DPLA was still at the talking stage. (See the DPLA planning initiative wiki at http://cyber.law.harvard.edu/dpla/Main_Page.) The Summon people should be glad to hear that Wilkin considers HathiTrust’s collection to overlap with those of many types of libraries. He calculated the overlap with ARL (Association of Research Libraries) at around 36% and at close to 50% with a good college library.
“We’re just at the beginning of our efforts,” said Wilkin. “We’ll be expanding our access. In what we now provide to vendors, we incorporate our search into their search. We do all the indexing. We manage the schema. We push the content to their indexing routines. That’s all on our end. We provide the machine readable indexes, not the full-text content.” Wilkin expects that after processing the full HathiTrust content, Summon would scope things for the local libraries, as might other vendors to whom they had sent samples, but accessing the full HathiTrust collection would always be available directly on their own website. The planned improvements to HathiTrust’s own full-text searching scheduled for introduction this summer include faceting and better incorporation of bibliographic records. It will soon offer a mobile app for reading its ebooks. HathiTrust also plans to expand its non-U.S. membership. Having added the University of Madrid this year, it hopes to announce a new Canadian member soon. It also hopes to launch a new revenue model.