Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

News & Events > NewsBreaks
Back Index Forward
Twitter RSS Feed

OCLC Ingests OAIster: Pearls to Follow
Posted On November 12, 2009

At the beginning of this year, OCLC ( agreed to take over the running of the venerable open access service OAIster ( from the University of Michigan, which had run it since it began in 2002. (For details on the transfer, read the Feb. 5, 2009, NewsBreak, At the end of October, OCLC announced that it had completed the transition. The service now includes all the OAIster records. Users of the old site will now be automatically shifted over to an OCLC-based site ( This is just the beginning, however. OCLC has also merged the content of two other open access files-ArchiveGrid and CAMIO-into In January, OCLC will launch a separate OAIster file, allowing users to reach just this repository content guide. As with, the new OAIster-only file will be accessible for free. The experience gained from handling OAIster has led to improvements in the flexibility of's infrastructure itself. More improvements are in the offing for OAIster from applying other OCLC features.

OAIster taps into repositories using the Open Archives Initiative Protocol for Metadata Harvesting (OAI-PMH). It now has more than 23 million records from more than 1,100 organizations worldwide, including digitized books and journal articles, digital text, audio and video files, photographic images, data sets, and theses and research papers. Under OCLC management, records are available merged with's masses of library holdings, as well as through OCLC's FirstSearch subscription service. FirstSearch maintains the individual OAIster database, however, as part of a paid subscription-the FirstSearch Base Package. Contributors to OAIster can access the database through FirstSearch at no charge once they complete an OAIster Database Contributors Information form. (For details on how to become a contributor, go to OAIster records are also available in OCLC's WorldCat Local and WorldCat Local "quick start" service.

OCLC made a specific point in a letter to contributors issued at the same time as the press release that announced the status of the OAIster project; the point made was that contributors had the right to remove metadata records whenever they wished. All it would take is a request to the email address According to Chip Nilges, vice president of business development at OCLC, OAIster has had an increase in the number of contributors since OCLC took over the service.

When it comes to joining the service, contributors do not have to meet any specific "terms and conditions." Once OCLC has the contributed metadata records, it even allows Google to crawl the files. According to Kat Hagedorn, formerly OAIster/metadata harvesting librarian at the University of Michigan and now the HathiTrust special initiatives coordinator there, no barrier was erected in the past to keep Google, Yahoo!, or other search engines out, but, "though we made the data available, it was behind a CGI script and that made it [difficult] for harvesting. We didn't have an API interface. It was cumbersome. When we started talking to OCLC, we made it known to them as to who wanted such a big batch of data and who we had given it to. We told them to put it on their list of things to do, that is, making it possible for others to come and get the data with an API." At present, OCLC is looking at both API and rsync solutions.

Regarding the openness of the system, I asked both Nilges and Hagedorn whether they felt the system might be vulnerable to spamdexing or the infiltration of junk records. Both assured me that, although the system was open-minded as befit the open access community, there was still a human factor in judging and evaluating new contributors.

Other OCLC open access products included in stem from the acquisition of Research Libraries Group (RLG) some years ago. ArchiveGrid helps locate historical documents, personal papers, and family histories held in archives across the world. CAMIO (Catalog of Art Museum Images Online) identifies high-quality art images contributed and described by leading museums worldwide with all rights cleared for educational use. This year OCLC also enhanced its CONTENTdm Digital Collection Management Software to allow uploading of CONTENTdm metadata to WorldCat through the self-service WorldCat Digital Collection Gateway. This also allows users to download records to their local systems. Now FirstSearch Base Package subscribers can use an entry-scale, hosted version of the CONTENTdm Quick Start software with secure systems support, three Project Clients for collection building, a 3,000 item limit, and 10 gigabytes of storage-all for no additional charge.

In implementing these changes, Nilges says OCLC's strategy followed "at least three directions. The first piece was the metadata component, how to support metadata creation and the content in institutional repositories and what kind of support was needed. The second piece was discovery, how OAIster would integrate with WorldCat content and what end users would want to find from OAIster as a distinct, evolving resource. Third was the intersection with library digitized content in general."

When it comes to integration and discovery, I asked Nilges whether OCLC saw future services as solving the "versioning" problem, for example, clustering alternative options for reaching strongly similar, or maybe even identical, presentations of the same content. He responded, "Not in the first iteration, obviously, but and WorldCat Local have integrated with journal article metadata from all the FirstSearch databases. They can search OAIster at the same time with combined result sets. There are format facets on the left of the search interfaces that can let you reach just articles or just internet resources." Major improvements planned for OAIster should take place in the next 3-6 months, according to Nilges.

Two items still concern Hagedorn. OCLC's current plans for update frequency seem to hover around quarterly, maybe monthly. Hagedorn admitted that frequency of harvesting and updating varied under the University of Michigan's management, dependent on a number of factors. Stabilizing the process would help users. And subject headings seem to have disappeared in some of the OAIster searches currently. Nilges indicated that the FirstSearch OAIster service would show subject headings, although searches might not. But he said the January 2010 version of OAIster would certainly have the subject heads and, in time, probably would as well.

Barbara Quint was senior editor of Online Searcher, co-editor of The Information Advisor’s Guide to Internet Research, and a columnist for Information Today.

Related Articles

8/16/2010OCLC Announced Enhanced WorldCat Digital Collection Gateway
11/8/2010OCLC and EBSCO Announce Expanded Data Exchange

Comments Add A Comment

              Back to top