When I first worked in a cataloging department at a large public library, I was handed a guidebook to MARC records. It took me an hour or so to realize that MARC was just a clever scheme to organize all of the material on a catalog card and serve it up in a computer format. Indeed, at the time, my library had just joined OCLC and used those records to verify our ownership of the books so OCLC could print out a set of catalog cards. Over the next 2 decades, OCLC was printing fewer and fewer catalog cards and, eventually, it stopped altogether as the MARC records became the chief datapoint for online catalogs.
OCLC was busy expanding since its founding in the 1960s as a cooperative to establish a computerized union catalog for college and university libraries in Ohio. By the time I came on the scene, OCLC had about 25 million individual records for, seemingly, every book ever published. Its roster of contributing libraries by then reached every state in the U.S.—and beyond. Along the way, it established important partnerships, including with Blackboard, Elsevier, and Wikipedia. Look at any title in Google Books, and you will find a link to an OCLC WorldCat record with complete bibliographic data and a list of libraries near you that own the book.
After I got my M.L.S. and moved to the East Coast to work as an automation librarian, I worked on OCLC matters in every college that employed me. The database doubled and tripled in size as many more libraries signed on. By the 1990s, OCLC was developing a number of library automation products, but the core database of bibliographic records, now feeding online catalogs, was basically the same. Some enhancements, notably the 856 field, allowed MARC records to provide seamless links to ebooks or webpages, but the structure was still rooted in the days of catalog cards.
As an eager online-catalog watcher, I have long felt that library automation was due for a total paradigm shift. Sure, the discovery platforms have made catalog searching more intuitive, but they are simply a refinement and overlay of what came before. It turns out that the shift is here, and the catalysts for this change include OCLC and Tim Berners-Lee.
Linked Data Arrives
In 2006, Berners-Lee, whose credits include the invention of the World Wide Web, coined the term linked data as an offshoot of the Semantic Web project. He was addressing the problem that the web contains information that is basically flat. An online catalog search of “East of Eden” will bring results that do not differentiate the film, the book, and references to British politician Anthony Eden, who was a Middle East expert.
Using principles of the Semantic Web, names can be further identified as people, books, movies, or brand names. This is taking techniques first found in SGML (standard generalized markup language) and XML and transferring them to the web.
In February 2009, Berners-Lee gave a TED talk urging users to add data to the web. He said that every time you put up a photo on Facebook or Twitter, you are adding to the web’s structured data. Unfortunately, because these are proprietary systems, the data is walled off, so it cannot create the kind of information symmetry he thinks is important. He said that a tremendous amount of useful data is locked up in the hard drives of researchers and needs to be let out.
OCLC Gets Involved
I spoke with Andrew K. Pace, executive director for technical research at OCLC. He told me that OCLC is looking for ways to expand useful access to its immense database of bibliographic information. This includes use of search protocols developed for Wikipedia, as well as an enthusiastic embrace of linked data.
Pace told me that the first use of linked data is imminent and that you may start seeing in it in fall 2018 in WorldCat. See the following image from a page from OCLC that is using linked data:
This example is taken from the VIAF (Virtual International Authority File), a project created in 2003 by the Library of Congress, the German National Library, and OCLC. It is a collaboration with more than 50 libraries, including more than 2 dozen national libraries. In 2012, OCLC adopted the VIAF as one of its library services.
Considering that OCLC owns a massive database with billions of entities, Pace told me that as he and his colleagues investigated linked data, they were very interested in learning about its scalability, but apparently it has passed that test just fine. Now that they are ready to roll out the functionality, he said it is important for all of the divisions of OCLC to be included in the plan so that there is no organizational dysfunction.
What Linked Data Looks Like in Action
I asked Pace if he could give me an example of a fully formed working database using linked data. He suggested Linked Jazz, which is a detailed map of the jazz world that allows you to click on any name to show that person’s relationship with other jazz artists. This visually (and functionally) reminded me of Yewno, an important new research platform, but none of Yewno’s documentation has identified it as an example of linked data.
After many years of building the world’s largest database of bibliographic data, OCLC seems poised to leverage this resource to make it even more useful and powerful. As someone who has spent the last 52 years observing library automation from card catalogs to discovery platforms, I see this as the herald of an explosive growth in the field.