Stop the Trash Trucks: A Tasini Case Damage-Control Proposal
Posted On July 16, 2001
"If you can keep your head/when all about you men are losing theirs/and blaming it on you…." So begins Kipling's immortal advice to us all in "If—An Inspirational Poem." And now's the time to take that advice. Let's not panic here. Let's not take rigid, fixed battle positions and wait for victory or death. And above all, let's not get boxed into defining victory as unconditional surrender.
The Supreme Court's New York Times Co., Inc., et al. v. Tasini et al. case decision has tossed a bomb into the world of online content providers with a dead-center hit on the database aggregators and search services that carry full text. The decision clearly grants copyright ownership of the electronic reproduction of a freelance author's work to the author, not the publisher of the original print work, unless a contract exists between the author and publisher that clearly spells out the transfer of those electronic rights. This means that masses of online full-text articles licensed to database aggregators and, through them, to online host services and Webmasters, have become illegitimate, so to speak. The aggregators and search services and the publishers have been selling what they don't own, or at least don't have a right to sell.
When it comes to past liability, aggregators and search services have a parachute to protect their tender hides from the abyss yawning before them. For some years, contracts and licenses with publishers have contained a clause that, in one form or another, indemnifies the aggregator or search service by declaring that the publisher promises that it has the rights to license whatever it is licensing. Worst-case scenario: If the authors start hitting up the aggregators and search services for big damages, the online vendors could cross-sue the publishers for having gotten them into this trouble. Of course, at that point, the publishers would probably already be paying damages to freelance authors directly.
Within hours of the Supreme Court's June 25th Tasini decision, publishers began to make good on their threats and issued orders for search services to start stripping files of suspect material. Early this month, notices had begun appearing on DIALOG newspaper files, for example, warning of deleted records. In a Help News record, The San Jose Mercury News (File 634) and The Contra Costa Papers (File 645) noted: "Records not authored by staff members of The San Jose Mercury News have been removed. This deletion is universal in that all online providers … are removing the same records from their respective databases." (Of course, such publication-specific warning notices would probably be hard to find on services like LexisNexis and Factiva that usually lump all their data into massive chunks.)
The New York Times Co., one of the defendants in the Tasini case, announced it was ordering the removal of 115,000 records from LexisNexis and other sources and shutting down access to all New York Times book reviews on its own http://www.nytimes.com Web site.
The decision clearly gave freelance authors electronic reproduction rights, providing they didn't have written contracts stipulating their concession. But I suspect that not all the authors who now have clear title to their work would want to exercise it by removing it from online full-text collections—with or without compensation. The New York Times even asked authors to contact them if they did not want their material removed. Of course, then the National Writers Union, plaintiff in the Tasini case, took out an ad urging writers to not take the Times up on that offer unless they got compensated.
I, for one, am caught in a quandary. When I checked last year, I had over 600 articles in full-text on commercial services, a life's oeuvre. After eight hard-disk crashes on computers I have owned and discarded over the years, the only reliable backup system I have is the commercial online full-text services. Maybe I should sue if they try to remove my works due to Tasini. Or should I hold out for a lawsuit as a retirement bonus, an Acapulco Fun Fund? AAARRRGGGHHH!
Preserve the Searchability of Full-Text Archives—Please!
However all this works out in the end, commercial services and publishers should follow two ironclad principles in establishing policies and procedures post-Tasini. One, don't make matters any worse than they have to be. Do no further harm. Two, protect the interests of users. In fact, all parties have an abiding investment to protect the interests of customers. Whether writer, publisher, aggregator, or search service, anyone who works in and lives off the traditional structure of the information industry should always remember that the real competition remains the Web itself—that and Ignorance.
If people come to find commercial sources unreliable, if addicts of the free and open Web find that services to which they pay top dollar cannot deliver what they have promised, if uncertain users seeking the comfortable security of an established brand-name publication find that publication cannot even maintain a reliable inventory of their own archives.… Well, how long before disenchanted, disillusioned (or should we say re-illusioned) users decide that they could do just as well floating across the Web picking up information as they go? If The New York Times' own Web site has lost all its book reviews, then why not use the ones you find on Amazon.com written by "real people," Amazon's customers? If the full-text collections of DIALOG and LexisNexis and Factiva and ProQuest and Gale Group's Infotrac keep getting smaller and smaller and the reliability of retrieval spottier and spottier, then why pay high rates? In fact, why sign up for subscription contracts at all? For "hit-or-miss" service, why not pay hit-or-miss usage-based charges?
Stop! Even though publishers and aggregators and search services no longer have the clear right to provide full-text articles as documents, that's no reason to strip the files completely of such unauthorized material. Let's consider an alternative that would protect the ultimate consumer as well as the future interests of all the creators and handlers of the material.
Specifically, online commercial services and searchable database archives on publisher Web sites should continue to maintain the inverted file index terms and tags that identify material barred from full-text delivery by the Tasini decision. The inverted file indexes belong to the host services, regardless of the fact that all the terms were generated from text produced by authors, freelance or otherwise. If inverted file indexes remain complete and comprehensive, they could continue to identify relevant articles, at least by the information from those articles that's clearly not copyrightable—namely the bibliographic citations. They might even offer abstracts.
To those citations, the online hosts could append a notice indicating that this material has been blocked or removed due to Tasini restrictions. One would hope that the online notices would also recommend alternative routes to the documents. Such recommendations could extend over a wide range, from a relatively crude approach ("Call a librarian.") to the more profitable offer of document-delivery service ("We'll call a librarian for you.").
Just a reminder, in case anyone out there has forgotten how searchable archive databases come to be. A set of documents is submitted and logged into a linear file (i.e., a file that retrieves items as documents, often by an accession number). Once the linear file is created, search engine software processes the text to generate an inverted file index. In the case of full-text databases, that usually means taking every word in the text and tagging it by field (or segment) and position (for phrase searching) and linking it back to the full linear file document record. When users search, they only use the inverted file index until they create a set of search results. When they display all or part of the results, the system uses the list of identification links when it goes back to the original linear file to gather the documents.
In this proposal, the underlying inverted file index upon which the searching process rests would continue to retain all the index terms generated in the past by processing the full-text articles. But since no one could ever re-create a whole document from using the inverted file indexing, that indexing constitutes a new creation and one copyrighted to the online service. In fact, when documents get "withdrawn" from databases on online search services, usually that just means that someone has shut down the links between the inverted file index and the linear file containing the documents as documents. In most cases, services will only really eliminate discarded references when they conduct a major reload of the file.
Since bibliographic citations are not copyrightable, it should be comparatively easy for online hosts to pull up the references as part of a normal search. This way the searcher could at least scan the titles and dates and author names—or whatever—and make an educated guess whether the item might contain the exact material they need. Instead of getting a "document not available" message attached to an empty result document, the hosts would post "unavailable due to Tasini case restrictions" or something like that. But at least the searchers would know that their searching strategy touched all the records, even if they're not allowed to see all of the material in the retrieved records.
Some have proposed just leaving the citations in the database archives, but this would do very little good since the indexing would only retrieve from the tiny amount of data in citations. To do any good, the searchers would have to know so much about the target material that they'd probably have enough information to find it without a search.
Un-Gored Oxen and Un-Gory Customers
What's the advantage to this alternative approach? First, last, and in between, it protects the interests of the users. And those interests will be strongly damaged if full processing of the Tasini decision goes through—more than some publishers and services seem to realize. For example, when The New York Times Co. announced that removal of some 115,000 articles from their electronic archives, including those found on LexisNexis, a company spokesperson attempted to reassure users by pointing out that this amounted to only 3 percent of its archive. Well, maybe that's true of a major publisher like the Times, but think about all those small trade press publishers, the ones with the inside track on industry and product developments. Do you really think all of them will have legalities neatly tied up and databases pristinely exempt from controversy? What good will an online search do a company executive, if all it retrieves are articles from major publisher sources?
Regardless of the specifics or even the quantities of omissions, that's not the point. When searchers use a database archive, they expect it to be as complete as possible. We all know that no full-text archive is really complete—no ads, usually no graphics, usually no letters to the editor, often no short news items, sometimes no columns. Nevertheless, we do rely on the service to provide a consistent level of coverage.
Say a client wants an article he or she knows appeared in a source. Well, clients often—did I say often?—I meant often get the source wrong. If you know that the type of material described doesn't fall into any of the categories for material left off-line, then you can argue with the client that you've done a comprehensive check for the relevant article and something's wrong. Either the client got the source wrong (most likely) or the material was never archived (e.g., it didn't appear in a newspaper's "issue of record" edition). Whichever, when searchers start arguing with a client, they put their professional skills and the competence of the tools supplied by the vendors they've selected on the line. And as for end-users accessing files connected through an intranet, they won't even know they're wounded until the undertaker starts inserting the embalming fluid.
Searchers pay online commercial services top dollar not just for information, but for peace of mind about information. Now, not only will every search become a painful "I wonder what I'm missing" experience, but searchers will never even know for sure when they are not missing anything, or when a certain search strategy on a particular database did retrieve all the relevant material. I would hate to be the next salesperson from a commercial host to walk through the door of a client's office after that client has just spent an hour and a half paging through print, motoring through microfilm, or inching through indexes only to find that the online search he or she did in the first 5 minutes had been comprehensive after all—for once.
Here and now, I promise all commercial hosts and publisher Web sites that I, for one, will beat my drums as loud as I can beat them to tell all users everywhere—particularly those large intranet-based subscribers—to refuse to pay the same rates to full-text services that do not protect their interests as well as they can. Then, when contract renewal time comes around, it should be a whole new ballgame. I call on searchers everywhere who agree with this approach to send me e-mail messages (firstname.lastname@example.org) of support. I promise to forward them on to the relevant executives at the online services.
And dear consumer readers, I also urge you to copy this article and forward it to all the representatives you know at all the full-text services you use with notes of support attached. Do it now before the trashing of online data has gone too far, before the databases you rely upon are damaged beyond repair. And if you see a database notification indicating withdrawals, contact the publisher and argue for reinstatement. If they tell you it's too late, question that opinion. Ask to talk with the techies or senior management.
Why Wouldn't They?
Technically, this approach should be as doable as removing masses of information. The concerns might be political. Clearly, publishers and the information industry are heading toward the U.S. Congress for remedy. They will want the Congress to pass laws that overturn the Tasini decision. At the same time, authors also are not slow to approach Congress.
Nonetheless, as far as I can see, the damage-control approach suggested would pose neither an advantage nor a disadvantage to either side. If publishers and online hosts want to gain the advocacy of users by making them "feel the pain" of the Tasini case decision, then seeing Tasini-barred references pop up in search after search should help that goal. If authors want to prove to publishers and hosts that they are cutting off their noses to spite their faces by refusing to work out clearinghouse arrangements for electronic rights, then the statistics on how many times all parties lost sales due to Tasini-barred material should also help.
But more than anything, all parties involved in the provision of expensive published literature should remember that their greatest competitor is the open Web and their only hope of survival is a trusting, friendly customer base.
[Oh, by the way, at this year's Internet Librarian conference in Pasadena, California, the Southern California Online Users Group (SCOUG) will sponsor an evening session on November 6 entitled: "The Tasini Decision: Is This the End of Full Text as We Know It?" On the panel will be top executives from Dialog, Factiva, LexisNexis, Gale Group, and ProQuest, as well as Jonathan Tasini of the National Writers' Union. More important, in the audience, the panel will find users, users, and more users. I hope you can attend the whole conference, but if you cannot, join SCOUG (http://www.scougweb.org) and get a pass to the session plus a day pass to the exhibit hall. Did I mention that SCOUG doesn't charge membership dues? Well, now I have.]