Google Scholar Focuses on Research-Quality Content
Posted On November 22, 2004
Despite all the warnings from experienced information professionals, many scholars, researchers, and students continue to make Google their first stop for locating research information. Google has now introduced a beta service called Google Scholar (http://scholar.google.com) that segregates research quality sources and provides special search features and result displays to accommodate scholars' information needs. While not removing any sites from the main Google service, Google Scholar enables specific searches of scholarly literature, including peer-reviewed papers, theses, books, pre-prints, abstracts, and technical reports. Content includes a range of publishers and aggregators with whom Google already has standing arrangements, e.g., the Association for Computing Machinery, IEEE, OCLC's Open WorldCat library locator service, etc. Result displays will show different version clusters, citation analysis, and library location (currently books only). Although claiming coverage "from all broad areas of research," early evaluation seems to show a clear emphasis on science and technology, rather than the arts, humanities, or social sciences.
Anurag Acharya, principal engineer for Google Scholar, stated that the goal of the service was to "make it easier to find content, open access or not. The first step in any research is to find the information you need to learn and then build on that. Not being able to find information hinders scholarly endeavor."
While tapping the open Web and harvesting institutional Web sites whenever possible, the service will present data, e.g., abstracted bibliographic citations, which the user will have to locate. Acharya said: "Once scholars know something's there, they can find a way to try and get it. In some cases we will try to help you, e.g., the OCLC Open WorldCat, which is 2 million records at this point." Actually, OCLC has expanded harvesting opportunities in Open WorldCat to the full 57-million-plus records of the total WorldCat database. At present, Google Scholar only taps a narrow selection of scholarly books, but, when I pointed out that the full WorldCat included several million serial holdings records, Acharya was very interested. An OCLC representative indicated that it was already in discussions with Google to expand Open WorldCat coverage (http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16592).
Content in Google Scholar comes from a wide range of academic publishers, professional societies, pre-print repositories, universities, and scholarly articles across the Web. Yahoo! Search already has arrangements to access some similar material, including the millions of articles reached through the Open Archives Initiative material collected by OAIster. When asked about OAIster specifically, Acharya regretted that OAIster "doesn't give us all the information we need. Its metadata is often incomplete. It's not OAIster's fault. They do a wonderful job, but the data providers do not always provide full information. We are not able to distinguish the adequate records at this point, but it remains a data source under consideration."
Overall, publishers and scholarly societies that want their content accessible through Google Scholar need only contact Google to arrange for Google's spiders to crawl their sites. However, Google does insist that, to participate in Google Scholar, sites must provide access for non-subscribers to bibliographic citations and abstracts.
For improved searching, the system can even accommodate multiple versions of authors' names (e.g., initials or full names), an issue to which designers had to give a fair amount of effort to resolve, according to Acharya. For restricting a search term to an author name, the syntax is "author:lastname", to which searchers can append other title or subject terms if desired. For locating specific articles, Google Search documentation recommends the last name of the first author, plus title terms or a title phrase in parentheses. Some citations within Google Scholar come from the footnote and endnote citations within documents, even if such documents are offline and only available in print.
Relevancy ranking of results in Google Scholar expands beyond the usual criteria. It takes into account the full text of each article as well as the author's articles, the publisher, the publication's prestige, and citation frequency. Some of the sources tapped by Google, e.g., the National Library of Medicine's PubMed Central, give excellent abstracts and indexing metadata. However citation frequency measures will only analyze citations within the Google Scholar corpus. Book citations usually come with a Library Search option for OCLC records and a Web Search option to locate online bookstores. A "Cited By" icon on records can lead directly to articles citing the article record displayed.
I asked whether Google thought it would raise any controversy by insisting on some sort of quality filtering, whether through publishers or scholarly societies, before it would post articles from authors whose work didn't already appear in Google Scholar. Acharya was shocked that I thought Google would censor scholarship in any way. Nonetheless, the language of an About Google Scholar FAQ (http://scholar.google.com/scholar/about.html#about) seemed to indicate this policy. (See the response to the question: "I'm an author and my articles don't appear in Google Scholar. How do I remedy that?") Another Google representative indicated that the language might have been misleading and simply served to ease efficiency issues for Google that would make dealing with groups of scholars, represented by publishers or societies, more efficient than dealing directly with individual authors.
However, Acharya assured me that Google did not censor anything. "We pick up content from publishers and the open Web." When I wondered whether some form of quality filtering might not serve users better, Acharya admitted that this was "a tricky issue in general, but quality will reflect naturally in the way things are ranked. We do not decide what is scholarship and what is not scholarship, but if something is not as important, it will not get as good a ranking on the corpus. Good material will gradually bubble up."
Clearly Google Scholar will have to wrestle with the "versioning" problem of cites to research results released in different formats through different venues at different stages of the research process. Acharya said: "We try to identify multiple versions and cluster them when we give results. We will give the alternative sites and the number of versions." This in particular represented an area in which he felt they needed more user feedback.
I also asked if Google planned to provide sorting by availability (immediate open Web vs. controlled access). Not at this time, though he thought it an interesting idea. Acharya said this was another tricky area. For example, he pointed out that "it was funny. So many people do not know that they have access through institutional subscriptions."
In announcing the new service, Google requested all users of the beta to "Please let us know if you have suggestions, questions, or comments about Google Scholar. We recognize the debt we owe to all those in academia whose work has made Google itself a reality, and we hope to make Google Scholar as useful to this community as possible. We believe everyone should have a chance to stand on the shoulders of giants."
Google even promised to amend any mistaken article descriptions when informed by "appropriately outraged" authors. As for the financing of Google Scholar, at this time at least, Google has no plans to introduce advertisements. Nor does it get a cut off any new subscriptions. For those interested in sending or forwarding reactions to Google Scholar, a contact link resides in the About Google Scholar FAQs, or you can send e-mail to email@example.com.
Within hours of the announcement of Google Scholar, blogs, listservs, e-zines, and other quick reaction services were abuzz. The open access movement sources were particularly gleeful; some spouted advocacy of an availability filtering option for displaying results that would favor OA material significantly. Stevan Harnad, exponent of self-archiving in institutional repositories as the best form of open access, stated: "An extremely valuable and welcome new service (and about time!)." BioMed Central issued an elated press release saluting the new service.
On the librarian's side, Gary Price's Resourceshelf coverage saluted the effort, but wryly wondered if this could be another nail in librarians' coffins as Google again stepped up to tasks that librarians and vendors should have taken on. He also worried that Google's proprietary restraint in discussing how its algorithms define scholarly content is "not an insignificant omission," particularly to information professionals.
Laura Felter, the new The Better Mousetrap columnist for Searcher, worried: "Considering the difficulty one often has of directing researchers away from their ‘googling' to more scholarly search engines such as scirus.com, it is easy to predict that Google, Inc. will succeed with Google Scholar simply by shunting sci/tech researchers already in the Google domain toward a subset of authoritative content—regardless of the quality of its content and publication sources….What will be needed from Scholar is a clear layout of its sources, just as Scirus provides, in addition to some strong search tips for the serious user whether they be information pros, experts in their field, or the insanely curious."
Such reservations should appeal to information industry vendors who may find the new service downright depressing. Consulting firm Outsell, Inc. called the timing of the announcement "a marketer's nightmare for Elsevier's Scopus." Elsevier has just launched its Scopus service and still promotes Scirus, a free, specialized sci-tech search engine accessing "over 167 million science-specific pages" with an advanced interface. Outsell's observations ("A Quick Take on Google's New Google Scholar," Nov. 18, 2004) continued: "Elsevier made a strong pitch that Scopus, with its ease of use and Google-like simplicity, would draw ... users away from Google back to library resources. Well, here comes Google with Google-like simplicity."
Jill O'Neill, director of planning and communications for the National Federation of Abstracting and Information Services (NFAIS), found Google Scholar "an interesting step by Google, but there are a number of caveats in this beta-version that need to be addressed before scholars will feel that Google Scholar is on a par with a fee-based service from a scholarly society or one as comprehensive as the Web of Science or Scopus." Nonetheless, she said that reactions from NFAIS members with whom she had spoken tended to fall into two camps—worried and "Where do I sign up?" One researcher busily testing the product commented, "I think a significant number of library-world databases have just become marginal niche products."
Most likely the competitor Google will keep its eyes on is Yahoo! Search. Observers expect that the launch of Google Scholar may lead Yahoo! to upgrade searching and presentation of results from its own collection of material from publishers, societies, libraries, and library vendors acquired through its active Content Acquisition Program.
The recent OCLC/Yahoo! toolbar ("OCLC and Yahoo! Offer Joint Toolbar, http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=16328) also segregates library-style material for special treatment, but—at present—only has listings from OCLC's Open WorldCat (though apparently Yahoo!, unlike Google, is harvesting the entire WorldCat collection). Expanding the toolbar's coverage or altering the Yahoo! Search site to include an icon for library material would seem fitting responses for the next round of search engine wars.
In any case, mindshare marketing strategies (counting eyeballs, not dollars) makes for happy users, even among information professionals.