Libraries, archives, and museums have vast numbers of resources within their four walls “that the Web can not see or use,” according to the press release introducing the Libhub Initiative. This project, which had its official launch at the American Library Association’s (ALA) annual conference this summer, aims to raise the web visibility of libraries’ resources by allowing search engines to see inside a library to the item level.
Libraries “need to speak in a way the Web can see and represent consistently. Our users live on the web and rely on the Web to deliver information resources, yet the lack of access to harvestable library data and a consistent way to understand that information has removed libraries from view of Web users,” the site’s FAQ section states. This effort will allow the Googles of the world to harvest data from library catalogs and make sense of their contents.
Another great hurdle to overcome is that the current system prevents connections between resources and collections. For example, each entry in a catalog is related to another collection, but cross-linking resources from one collection to another is not possible today. “The Libhub Initiative uniquely prioritizes the linking of these newly exposed library resources to each other and to other resources across the Web, a critical requirement of increased Web visibility” as search engine algorithms rely on links to determine how to present returned search result hit lists. Weaving in library content will improve people’s ability to discover the resources on the open web by clicking on an item returned by a simple search and being taken back to a library’s catalog.
The Semantic Web (aka the Web of Linked Data)
There has been much talk about Web 3.0 relying on open, linked data. If applied to libraries, this means catalogs will no longer be bound up in proprietary and closed ecosystems. The Libhub Initiative provides a neutral mechanism that allows for connections across systems. Beyond simply exposing and linking collections, the Libhub Initiative is the first in a series of steps that enables search engines to understand the concept of a library through shared vocabularies.
“By marking up information in standardized, highly structured formats like Resource Description Framework (RDF), we can allow computers to better ‘understand’ the meaning of the content, rather than simply matching on strings of text. This would allow web search engines to function more like relational databases, providing much more accurate search results—the ability to distinguish between a book that is written about a person, as opposed to a book that is written by a person, for example,” D-Lib Magazine notes. A controlled vocabulary and mapping of resources will contribute to knowledge creation that bridges disciplines.
While RDF underpins the Semantic Web, other technologies and vocabularies are contributing to its feasibility—e.g., Web Ontology Language (OWL). For example, Schema.org is being used to create a new data interchange format/standard to help libraries share: “This site provides a collection of schemas that webmasters can use to markup HTML pages in ways recognized by major search providers, and that can also be used for structured data interoperability. …”
History of the Effort
In 2011, the Library of Congress (LC) sought a way to move library data to the web, and it recognized that the traditional MARC record was not sufficient. Zepheira, a consulting firm with some data and standards expertise—especially in the library space—was selected by the LC to convert its records to a format that could expose MARC data to the web. This new model, BIBFRAME, “provides a foundation for the future of bibliographic description, both on the web, and in the broader networked world. … In addition to being a replacement for MARC, BIBFRAME serves as a general model for expressing and connecting bibliographic data. A major focus of the initiative will be to determine a transition path for the MARC 21 formats while preserving a robust data exchange that has supported resource sharing and cataloging cost savings in recent decades.”