Yahoo! Search Joins OCLC Open WorldCat Project
Posted On July 6, 2004
OCLC (http://www.oclc.org) has expanded its online library locator service for books to Yahoo! Search. Last October, I reported on a new pilot project between OCLC and Google that opened library holdings information for just under 2 million items in the WorldCat union catalog (extracted from the 55 million items with over 900 million holdings recorded; see http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16592). In January 2004, Yahoo! approached OCLC to arrange access to Open WorldCat records under Yahoo!'s new Content Acquisition Program. (For a description of that program, see http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16499.) While Google took months to spider all the OCLC data, Yahoo! moved very quickly. The agreement was signed May 21; content first appeared on Yahoo! Search May 28; and full crawling and loading of the 1,993,073 set was completed June 6. Overall, OCLC seems to consider the Open WorldCat project a wonderful success and plans to expand it.
Commenting on Yahoo!'s quick load of the service, Chip Nilges, director of WorldCat Services, said it was "amazing, just amazing." Yahoo! uses a batch load method to "ingest" data. According to David Mandelbrot, Yahoo!'s vice president of search content, Yahoo! has concentrated on speeding data flow. "We have combined strong crawling with good feed technology to integrate data. We receive the feed of metadata—titles, descriptors, URLs, and such—review them to verify and then quickly include them in our information. It is much more effective than just crawling. Our strong feed infrastructure enables us to refresh data more quickly and provide greater currency."
Mandelbrot also saluted OCLC for having done "an admirable job of making it easier to find content in an offline world. That's a big part of why we have this relationship. We want to make sure that our users get really good results and sometimes the best content is not online."
By OCLC standards, usage statistics have been good during the Google phase, but the addition of Yahoo! has made them a lot better. In May, OCLC reported some 725,000 click-throughs from Google and the four online booksellers that also access the data. (Together, the four booksellers average 50,000 clicks a month.) But in June 2004, usage jumped to 2 million click-throughs. Access through Yahoo! also connects to Microsoft MSN users, since Microsoft has replaced Google with Yahoo! Search on its consumer service.
Nilges attributed a lot of the increase to adding Yahoo! as a partner, but also to increases in usage from Google. He said the program may also expand usage of individual library Web sites. "How many times people see ‘Find in a Library' in results, we can't count, but we can count the click-throughs and conversion rates. We do know how often people go from our OCLC information to local library Web sites and OPACs. It's running around 8 percent, which is about the same as we see on our own FirstSearch service."
One might expect jumps in usage of such magnitude to put a strain on OCLC's system, but Nilges assured me that "2 million is a comfortable increase. We wanted the pilot to check this sort of thing out and we've found it very manageable." Currently, searches from Open WorldCat constitute over 25 percent of all WorldCat "discovery traffic," with a third of those users going on to view library holdings and, of that third, one quarter linking directly to local library resources.
Problems persist in the program. Often OCLC records fall far from page one of search results. Rigidity in metadata can also affect success. For example, when checking out a book title that OCLC recommends as a test case on its own Web site—The Da Vinci Code—it fell 29th on a list or at the bottom of page 3 of a Google results display. More problems: If one spelled the title "DaVinci Code," it still hadn't turned up on the 24th page. When one does click through to OCLC to get the name of a local library, the list of libraries is arranged alphabetically, instead of by geographic distance from or within the user's designated ZIP code. Users accustomed to Yahoo!'s Yellow Pages service may expect to see results sorted by distance, not A-B-C. Gary Price, guru of the Invisible Web, has been monitoring the Open WorldCat on Google and now on Yahoo! for a future article. His best advice to users remains never to omit checking target library OPACs directly.
Questions also arise as to how often the Open WorldCat is updated. Nilges said that OCLC scans the main WorldCat file monthly to identify records that have attained the 100 library holding minimum. He thought that Google then crawled the system quarterly. However, once a Yahoo! or Google user locates a title in Open WorldCat and clicks over to the OCLC server to retrieve holdings information, that holdings information will be the latest available, updated daily. Nilges explained "the metadata that is being indexed in the partner sites represents the items in WorldCat that are held by 100 or more libraries. This subset doesn't change frequently—once an item reaches 100 holdings, it tends to stay there. Also, it takes a bit of time for a new item to reach 100 holdings."
After checking The Da Vinci Code and not finding it in my local library (Santa Monica Public Library), I went to SMPL's own OPAC and found the book listed. Nancy O'Neill, principal librarian for reference services at SMPL, said the library had subscribed to the program and had several times complained to OCLC when they did not see their library listed on search results. However, she also reported that they had begun getting some referrals from the service and she was pleased. Only 1.6 percent of the 12,000 public, academic, and school member libraries, whose holdings OCLC tapped for Open WorldCat, have chosen to withdraw their collections.
Nilges commented on some of the enjoyable side effects of the new project. "Besides the exposure, there has been a lot of extra value. As we have talked with both Yahoo! and Google, we have had some influence on how they think about and handle this type of content. That's part of our mission, to try to represent the interests of the library community, and, in doing that, we have also talked on how to promote library content."
The new connections have also influenced OCLC's traditional partners. Nilges admitted that OCLC's holdings lists did not mirror all its member libraries holdings 100 percent, but, due to the Open WorldCat project, "we have already heard from libraries that want to add more holdings. It's reinforcing the value of participating in a consortium and that's a great outcome."
The new connections have also begun to bring in new content. "We're starting to hear from a lot of new people, such as our work with the DSpace folk and other digitizing libraries. It has generated renewed interest in library holdings in WorldCat, often from nontraditional publishers." OCLC has a digital content service called ContentDM, which has an option for sharing metadata. Recently it received requests to include digital objects into the Open WorldCat project.
For the future, Nilges defined three goals as OCLC moves toward a full production version of Open WorldCat. "First, we want to expand our list of partners; second, we want to expand the content available to our partners; and third, we want to add services for libraries, e.g., moving from the authentication by IP address now to show users the different options libraries offer, such as link resolvers, interlibrary loan, and the statistics important to libraries." Nilges admitted that the issue of ranking, or how high the OCLC records appear in search results, "continues to be the big issue we're contending with. We're competing with commercial entities. However I'm seeing more cites pointing to our bibliographic content these days and that helps our ranking."
As for Yahoo!'s long-term plans, I asked Mandelbrot how he felt the Content Acquisition Program was going and, in particular, if the company had any plans to give this new "library-style" content special treatment. He said: "Down the road we are focused on personalized search. We have a team of engineers working on discovering user intentions when they do a search. We're looking at folders and clustering and other personalization techniques. Relevancy and freshness are huge priorities for us and our technology does a much better job in those areas as well as in comprehensiveness."
Just last week, Yahoo! launched a redesign of its search engine results page. An "also try" link on the page provides suggested search query terms, again focusing on closing in on searcher intentions. As for new content for Yahoo! Search, Mandelbrot said to expect some important content announcements within a couple of months.