Google Opens Public Domain Books for Downloading, Michigan Launches MBooks
Posted On September 5, 2006
Google has changed its policy and will now allow users to download full-image files of public domain books in its Google Book Search (http://books.google.com) collection. Until now, Google had insisted that readers remain connected to Google while they read any public domain books online. Why the change in policy? According to Adam M. Smith, product manager for Google Book Search, "It stemmed from listening to users and our library partners." Competition may have had some influence, however, both from the downloading policies of the Open Content Alliance and, now, from Google's own library partners. For example, the University of Michigan—one of Google Book Search's most generous and activist library partners—has begun releasing MBooks to the open Web, as well as to its campus users. The MBooks collection currently includes hundreds of thousands of books Google has digitized from the University of Michigan's library collection. The MBooks offer different features than the versions Google Book Search supplies directly. It also includes in-copyright books, though only to produce individual book indexing.
The Google Books Library Project digitizes content from six major research libraries—the University of Michigan, Harvard, Stanford University, Oxford University, the New York Public Library, and the University of California. (See the NewsBreak "Google Book Search Adds Big, Brave Partner: The University of California" at http://newsbreaks.infotoday.com/nbreader.asp?ArticleID=17375.) Some of these libraries, e.g., the New York Public Library, limit their contributions to public domain items only. Though Google's announcements hype the arrival of Dante and Victor Hugo on every computer, lesser known and minor works constitute the bulk of the content. Classics continue to be available from Project Gutenberg and other established public domain services, and the Open Content Alliance continues to work on the major digitization of public domain books with its own library partners. These sources provide text-searchable downloads.
To reach the downloadable books at Google Book Search, select the "Full View Books" button on a Google Book Search page, click on whichever title(s) interest you from a list of search results, and then choose the download option on the right of the screen. Not all the books returned will have a downloadable version. Some publishers contributing to Google Book Search under the publisher partnership grant permission for users to read the entire book, but—for now, at least—those books must be read in "Full View" mode while connected to Google. There is no apparent way to limit a search to downloadable-only results. The downloaded PDF image files can be read offline, saved, or printed out at will. Google recommends Adobe Acrobat Reader 7 for most reliable results.
Some initial press reports on Google's release of public domain books for downloading contained inaccuracies. The PDF files that Google Book Search supplies are image files with no keyword search capability. To move through a book using keywords as guides would require reading the book while connected to Google Book Search. Also, Google does not include any ads on books contributed by its library partners, a policy which, of course, covers all the downloadable books. However, the publisher-contributed and in-copyright books in "Full View" do carry sponsored links.
Some other minor points turned up in the course of researching this story. For example, not all public domain material available in Google Book Search will be downloadable. Reprints contributed through the publisher partner side of Google Book Search are handled the same as in-copyright material, with bibliographic and publisher-contributed background information and some KWIC (keyword in context) snippets of text. When asked whether the new, more generous policy might distress some of their publisher partners, Smith pointed out that many public domain books were already available through free e-book sites like Project Gutenberg. "Publishers thrive already through adding value, through annotations, introductions, etc., to public domain material in print. They will still continue to do so." As Google finds and digitizes older (pre-1923 usually) versions of the material, the older versions would become available for downloading.
In one interesting discovery, it appears that not all the books in Google Book Search are books. If library partners have chosen to bind old periodicals and file them with the books, Google scoops them up and sends them hurtling through its digitization process. However, such serials are not treated as serials with appropriate, serial-style, bibliographic citation. Nonetheless, interesting journal results may turn up. For example, a search for Dante's Inferno pulled up a 1920's commentary in the Romantic Review. The article started on page 223, but only paging back through the "book" would take you to the specific article citation or even allow you to confirm the year, since the bound periodical set might include multiple years.
With the arrival of full-text books for downloading, librarians, eager bloggers, and active Web site managers may find it handy to have Google Book Search in easy reach. No problem. You can add a link to Google Book Search to your Web site by following the instructions at https://services.google.com/inquiry/books_email?hl=en. However, all is not beer and skittles. Google asks you not to use the files for commercial purposes nor to remove its "watermark." It also warns you that copyright laws vary from country to country. Readers ride at their own risk. In its own release policies, Smith said that Google "treats content conservatively. Sometimes we put constraints on viewing public domain on Google. For example, we will block a book if it is not in public domain for that country."
Google has also added a "Library Catalog Search" feature to its Google Book Search entries. In addition to the "Find in a Library" feature that connects to a selection of entries from OCLC's Open WorldCat, the "Library Catalog Search" also reaches more than 15 union catalogs with holdings from more than 30 countries. In addition to Open WorldCat (which itself reaches many countries), union catalogs include Gegnir (Iceland), MOKKA (Hungary), Israel Union List, Porbase (Portugal), LIBRIS (Sweden), RERO (Switzerland), the National Library of Australia, bibliotek.dk (Denmark), ChinaCat (Chinese Academy of Sciences), the National Library of the Czech Republic, COBISS.SI (Slovenia), the Austrian Union Catalog, NBINet (Taiwan), LIBIS (Lithuania), and Talis Source (U.K.). A radio button on the Advanced Search page allows searchers to conduct searches on these library catalogs for material not necessarily in Google Book Search.
Michigan Says "Go"
As stipulated in the contracts with its library partners, Google continues to deliver digital copies of the items each library has contributed back to the individual libraries. Last week saw the launch of MBooks, integrating the digital works resulting from the University of Michigan/Google Digitization Partnership into the Mirlyn online library catalog (http://mirlyn.lib.umich.edu). Michigan has been one of the most generous partners in the program and, according to John Wilkin, co-interim University Librarian, now hundreds of thousands of books, both out-of-copyright and in-copyright works, link through the online catalog. In compliance with copyright laws, only the public domain content or content for which Michigan has approval from the copyright holder actually makes the text of the works available through the catalog. The catalog also carries links to the Google versions of public domain content.
The MBooks have different features than Google's downloadable books. Basically Michigan has taken the OCR, page-by-page text with which Google supplies its library partners and strung them together in one giant sequence, tagging the text to the appropriate page. While you can view and download the entire text file, Michigan warns you that it could run 1,000 pages in some cases and might crash your browser. Brave hearts who have found the pages they particularly want to view can switch to those pages as images and download them page-by-page as separate PDF files. In cases where Google Book Search has gotten its books from Michigan, tricky searchers might even use the text-searching capabilities of MBooks to find the pages they want to view in a book downloaded from Google Book Search. Michigan has also updated the bibliographic information, created persistent URLs to ensure proper citation, and provided the ability to change resolution and angle (magnify or rotate). Currently users can only text search one book at a time, but they hope to offer multiple book searches, as well as Advanced Search features and usage statistics, in the future.
The MBook collection might include books for downloading that Google Book Search would not. For one thing, Google has not completed its addition of the downloading function to all the appropriate books, though the company is working on it "as quickly as possible." For another thing, Michigan seems to have more assertive policies and procedures in this area. For example, while Google seems somewhat reluctant to make all scanned federal documents available, regardless of date, Michigan considers works of the U.S. government as "uncopyrightable works" and includes them. It also includes works where it has the permission from the copyright holder. It has also instituted a program of investigating—an admittedly "(huge) task"—what is in the public domain or whether rights-holders would grant permission to display full text.
In a very interesting development, Michigan is using the underlying OCR text analysis provided on a page-by-page basis by Google to index even the in-copyright books. While in the Mirlyn catalog, users can retrieve a list of the pages containing their search term and how often the terms appear on each page for any Google book. With this information in hand, interlibrary loan requests targeting specific sections of books could increase. Many research librarians are reluctant to mail entire books, but they may feel copying pages is not a problem. Even for the public domain material, the MBook versions carry underlying OCR translations that enable the user to use the PDF file as a text source, e.g., to copy and paste sections.
To reach the MBooks collection, go to Mirlyn, conduct an advanced search, and restrict the format to "Electronic Resources." (For more information on MBooks, visit http://mdp.lib.umich.edu/m/mdp/mdp-faq.htm.)
Whither and Whence?
As for future plans with Google Book Search, Smith could not comment on specifics, but the company had already announced a program in the spring for working with publishers to create an online reading environment. "We are talking with publishers about turning on the books so we can sell access in a Google Book Search ‘find from' partner program," said Smith. "That is one option. Users could purchase access for some payment. I can't speculate on future models but we are working with publisher partners to find opportunities users might find of interest." Smith also thinks that users might create their own new material. Smith chuckled when I suggested that Google Book Search might become the home of new Google mashups.
The impact of Google Book Search's release of readable, downloadable texts is yet to be seen. Karen Coyle, consultant on digital libraries, somewhat ruefully commented: "Google themselves are the only ones with a clear idea of what they are doing and they state it. They are creating an index to books that exist in hard copy, not trying to create books for reading online. That's not their primary goal." Coyle raised the issue of quality, pointing out that there is a "real difference between producing an e-book and scanning to OCR for keyword searches." This is an issue. The FAQs for Michigan's MBooks asks users to report bad pages and guarantees plans to go back and fix them. On the other hand, Smith said that Google had no plans to redo books already done.
Nonetheless, as Coyle pointed out, both libraries and publishers have a chance to learn a lot from the Google project, including the readiness of users to accept digital formats. "I'm glad Google is doing the scanning. It's a big expensive experiment and one that libraries wouldn't have been able to do on their own, because the experiment may or may not succeed. But for Google, it's pocket change. The real value is that they have so much money that they can experiment with things that may not succeed."