OCLC and Open Access: Riding to the Rescue or Rustling the Herd?
Posted On February 5, 2009
In the midst of a firestorm about its proposed new WorldCat records policy (Policy for Use and Transfer of WorldCat Records, www.oclc.org/worldcat/catalog/policy/recordusepolicy.pdf), OCLC (www.oclc.org) has announced a partnership that would ultimately transfer an open access icon, the University of Michigan Library’s OAIster service (www.oaister.org), to OCLC. While some concern has already been expressed about how OCLC’s revenue generation and content control issues might affect OAIster’s future, I have absolute—almost vehement—assurances from Chip Nilges, vice president of business development at OCLC, and John Wilkin, associate university librarian at the University of Michigan, that OAIster will remain a permanently free, open access service. Until the transfer is completed sometime in 2009, the OAIster.org site will remain active. But, when completed, it will move into OCLC’s free, open website—WorldCat.org. It will also become a "no extra charge" addition to OCLC’s subscription FirstSearch service. OCLC has also announced an arrangement to assist the new HathiTrust (www.hathitrust.org) in developing comprehensive bibliographic metadata for the digitized documents of member libraries.
Begun in 2002 under a grant from the Andrew W. Mellon Foundation, OAIster was originally designed as a portal and a search engine reaching open repositories using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). Seven years later, it accesses close to 20 million records, mostly scholarly sources, from nearly 1,100 organizations. The records of digital resources harvested in OAIster cover deep web content extending from digitized books and articles, born-digital texts, audio files, images, and movies to data sets. Currently, users can search content by title, author/creator, subject, language, or an entire record. They can limit searches by resource type, sort by title, author, date, hit frequency, and data contributor. These access features will remain in place while OCLC works out the issues of how to handle OAIster content and how to integrate it with other OCLC services.
So why now? Why did the University of Michigan decide to ask OCLC to take over the OAIster service? There seemed to be some minor disagreement among OAIster management as to what drove the decision. Kat Hagedorn, OAIster metadata harvesting librarian and senior associate librarian for the Digital Library Production Service, considered it "untenable for us to run something this big," while her boss, Wilkin, thought it "no problem to keep on doing what we’re doing, to just crawl and search." But both agreed that to advance the service; to provide the improvements needed to make the data more uniform, e.g., reconciling alternative data formats; and to create a better user experience, it would require the commitment of serious development resources. So they turned to OCLC. As Wilkin put it, "It makes sense for someone in the business of global search to do this." Hagedorn thought that OCLC might even try to make the service more comprehensive by expanding its reach beyond OAI-PMH to other digital formats. Wilkin still holds to the grand dream with which he began the OAIster project. "I want to see more digital content on the web in OAIster. If we could have done it more neatly, we would even have added a search of Google."
But why was OCLC interested enough to take over OAIster operations? According to Wilkin, the university had approached OCLC 5 years ago about working with OAIster, but they found OCLC was not interested. The announcement of the new arrangement pointed to OCLC’s recognition that open access collections have become vital to scholarship. Nilges stated, "Adding records for open archive collections is a natural complement to WorldCat and will drive discovery and access of these collections for a broader community of scholars." Content should expand. Nilges pointed out that "We already have some digital repositories in WorldCat that could supplement OAIster. We absolutely see the need for development. We currently aggregate metadata for many ebooks, digitized content, and archival finding aids and now digital archives. This is strategic for OCLC. We’re interested in helping build and discover archival collections."
OCLC already has Collection Gateway software, according to Nilges, "designed to support harvesting. At some point we will use that software, which supports multiple formats." This all needs to be worked out, along with overall econtent synchronization programs at OCLC.
One thing, however, remains clear. Free and open access to the OAIster data will continue permanently. Nilges states, "We are absolutely committed to free and public access. We will run parallel tracks through 2009, while integrating OAIster into WorldCat.org [OCLC’s free service]." Wilkin confirmed that commitment. In fact, the issue of maintaining free open access is included in a clause in OCLC’s contract with the University of Michigan.
The HathiTrust is a new player in the open access arena, but it’s a major one with more than 2.6 million documents. (For background information, read Beth Ashmore’s Oct. 23, 2008, NewsBreak, "HathiTrust: A Digital Repository for Libraries, by Libraries," http://newsbreaks.infotoday.com/nbReader.asp?ArticleId=51225.) Participants currently include 24 major research libraries. Many of the libraries have conducted their own digitization projects to create special collections. Some have also worked with the Open Content Alliance. But admittedly, the vast majority of HathiTrust’s digital repository comes from a source not mentioned once in the press release announcement—Google Book Search. All the current members of HathiTrust are Google Book Search Library partners. Most of them belong to the early joiners in that partnership when Google was still using what John Wilkin calls "the firehose" approach and digitizing every book a library would allow them to digitize. They have grown much more selective with later library partners, according to Wilkin.
Under the new agreement with OCLC, the millions of books and archived documents hosted in a single repository by HathiTrust and made available for reading online will become more visible and accessible with the creation of WorldCat records for content. OCLC will also link to the collections in its Open Web WorldCat.org service as well as its WorldCat Local service. As executive director of the HathiTrust, Wilkin sees "the connection between HathiTrust and WorldCat as a natural. WorldCat and HathiTrust are both built by and for libraries, and their pursuit of comprehensiveness will aid our community in pursuit of more effective collection management, as well as integration of services across our institutions."
Wilkin admits that HathiTrust content comes "overwhelmingly" from Google Book Search. Under early license arrangements, Google agreed to supply its library partners with digital copies of whatever they contributed to Google Book Search. But Wilkin pointed out that university libraries have their own preservation digitization work and special collection digitization in there. "We are working on adding Open Content Alliance material now in an arrangement with the University of California," says Wilkin. "We’re focusing at the outset on monograph and serial literature." Wilkin expects that HathiTrust will take a different slant on the content it handles, aiming to help libraries in making acquisition and retention decisions and building tools for the scholarly community. But to do this, HathiTrust needs good cataloging information. Initially, Wilkin said, "OCLC will adapt WorldCat Local to HathiTrust. When the content moves to WorldCat.org, users will be able to search just HathiTrust content."
Nilges explained OCLC’s role as "making sure the repository’s content is represented in WorldCat and, secondly, working with HathiTrust to build a discovery environment for that collection. WorldCat Local is adapted as a discovery environment, and we’re using the project to understand better what sort of discovery environment would suit this collection." He expects the work to lead to ways to "handle any number of digital collections, to co-locate various versions, and then distinguish versions. We will let users comment and build tools in WorldCat.org and WorldCat Local. It’s an evolving model."
What drives these changes in OCLC policies? Nilges explains, "We need to represent the ‘Collective Collection.’ Special collections have become more important as they are digitized. We have the opportunity to represent those collections with metadata as the demand for access is growing. We’re trying to support HathiTrust in its near- to medium-term needs for discovery by whatever audience. It fits well with who we are."
Nilges and Wilkin both assured me that the controversial record policy was completely separate from this work with the HathiTrust. OCLC has an overall project to catalog or blend catalog information for Google Book Search entries into WorldCat and to supply the "Find in a Library" information to Google and other online book operations. Much of the HathiTrust work will represent a subset of that existing cataloging work.