On September 23, Google released its new News service (http://news.google.com). Since that was the day I was doing a presentation on Google at WebSearch University, I paid particular attention to the changes rampant on the site, which I would summarize as transformations in content, layout, and functionality. I wasn't the only one watching. So were many journalists, who are deeply disturbed by the notion of a completely automated news-gathering process. Their concerns, however, are likely to diverge somewhat from those of the research community.
According to its inventor, senior research scientist Krishna Bharat, Google News was a personal project that stemmed from his frustration at jumping from source to source to find recent news. He showed it as a demonstration project within Google, and several months later it went public. Bharat sounds both proud and a bit bemused that Google News has received so much attention.
Although the site is still in beta mode, a tab above Google's main search box has been added to guide users to News. How long will it stay in beta? According to a Google spokesperson, there's no hurry about taking it out of beta. "We like to tweak things," he explained.
From its former hundred or so sources, Google News has expanded to 4,000 English-language news sources that it crawls continuously. Bharat points out that these aren't exactly new. Since Google has been routinely crawling the sites for its Web search, it's just the first time they've been added to Google News. Since the news is gathered entirely by crawler and analyzed by automated search algorithms, stories are frequently duplicates. As events change, so do the wire stories about them. Plus, when newspapers pick up the wire stories, they frequently change them to fit local interests or size limitations in their print issues. This adds to the duplication.
One unique content type is television transcripts. If the transcripts are online, Google can find them. As with wire services, a transcript created by, say, Fox News, will also show up, again sometimes with alterations to reflect local concerns, at Fox affiliates' Web sites. The duplicative nature of news is well-known to professional researchers, who have long encountered it when using subscription databases on Nexis, Factiva, and Dialog. Keep in mind that one researcher's repetitive story is another's new twist on a breaking news topic. To its detriment, it may suggest a relationship that doesn't exist.
Mis-categorization is another criticism journalists have lobbed at Google News. Nick Denton, founder of Moreover Technologies and a veteran of computer-gathered news, laughs about a story on the Serbian elections being grouped with stories on Milosevic's genocide trial. He points to the algorithms' reliance on "ubiquity, recency, and source reputation" for article placement as being contrary to human logic. He has a point. On the other hand, if you have a general interest in Serbia, the juxtaposition of the stories is advantageous.
In a more serious vein, think about the concentration on English-language sources. If you want to know today's news on an event in Germany, wouldn't a German-language source make sense? At the moment, you get Deutsche Welle (the German television station transcripts designed for foreign consumption) in English, but not Die Zeit, Der Spiegel, Frankfurter Allgemeine, or any other German newspaper or news magazine. Breaking news in Denmark? Better hope that the Copenhagen Post covered it because that's the only English-language Danish source. In light of this, the link above the Google News search box to Preferences (this shows up after you've done your initial search), which lets you restrict your results to sites in a specific language, is totally inappropriate.
Is there a list of sources indexed in Google News? Yes, but according to Google it won't become public. You can click on a source list at the bottom of the page at News Resources, but it's a very abbreviated list that isn't any different from the one associated with the previous Google News. What Google will admit is that a number of publishers have asked to be included in Google News. Fans of publications have also requested that their favorite news source be added.
Because Google News spiders news sources already on the Web, it prides itself on being extraordinarily up-to-date. This gives it a distinct edge over the aggregated news databases from Dialog, Factiva, and LexisNexis. The entire site is auto-generated every 15 minutes. That doesn't mean every story is updated every 15 minutes, and Google is very good about telling you how old the story is. Keep in mind however that a story "updated 15 minutes ago" may not be 15 minutes old. There is a lag time inherent in the updating process and in crawler activity.
In a departure from its usual minimalist format, Google News is now formatted in double columns with breaking news accompanied by thumbnail images of news photographs. The thumbnails don't necessarily line up with the stories profiled to their left, since they come from different news stories. News is broadly divided into Top Stories, World, U.S., Business, Sci/Tech, Sports, Entertainment, Health, and More Top Stories. Three stories, with multiple sources, are on the front page. (The most recent are hyperlinked—for the full range of stories click on "and ### related.") For more news, click on the top level topics. The News front page is colorful and formatted to spotlight what the Google search algorithms have determined to be important news stories. But it doesn't really resemble a newspaper's front page. "Google News isn't directly comparable to a newspaper," says Bharat. "It is objectively created, not opinionated. It reflects the view of many editors who create the content on the sites Google crawls."
Once you click on a topic, you're looking at something more traditional to a Google search. Gone is the double column. Instead, it's headlines scrolling across the entire screen. You can reorder the headlines by clicking on Sort by Date.
With its layout determined by computer algorithm rather than human editors, Google News demonstrates a new approach to presenting breaking news. According to Bharat, most automated news gatherers simply scrape the headlines but pay no attention to actual content. His idea was to fetch the full article so that its full text could be searched, thus establishing where an article fits in context with other news stories. With fully indexed content, Google News strives to be more comprehensive than other news services.
The colorful page design is not the only departure from Google's customary look and feel. There is a search box, but no advanced search capabilities. Still, if you know Google search syntax, most of it works in Google News. You can use OR to broaden a concept and the minus sign (-) to NOT out a concept. The field limitations of intitle: and allintitle: work, as do inurl:. However, limiting to allurl: does not work. When I tested it, limiting to inurl: retrieved URLs mentioned within an article, not the source URL. Some of the other limitations are irrelevant in the News context. Limiting a news source to a file type, for example, is futile.
In a departure from normal Google searching, results can be sorted by relevance or by date. Relevance is the default. Sorting by date lets you track a breaking news story, often a very valuable exercise. It also points up the relevancy ranking that governs placement on the opening Google News page.
Two features are missing from Google News: cached and related sites. Do a regular Google search and results appear with hyperlinks to the cached site, the site as Google spidered it, and to a More Like This, which shows similar Web pages. No such hyperlinks exist at Google News.
There is a big difference between the News results and the News Search results. If you enter a search statement in the search box, you quickly encounter both the joys and the frustrations of the 30-day archive. The worst is the number of times you run into sites that are no longer available and get the "404 Error" page. This is particularly true for newspaper articles, which are usually freely available for only a few days before disappearing behind a firewall that requires you to subscribe or register. Bharat is working on a procedure to drop these non-accessible pages. This will probably require a re-crawling of the pages to determine which are still available and dropping those that are not.
Will Google News change online news? It's possible. Bharat notes that there are two types of News usage. There are the news junkies who are constantly on the news pages. He speculates they are searching the same thing over and over, possibly their own institution, to see what's new. Then there are those who skim the headlines and go away. This sounds like a typical behavior split when it comes to news. Companies already in this space, such as Moreover and Yahoo!, will attest to that. For background on the breaking stories displayed in Google News, it will still be necessary to go elsewhere, either to a Web search engine or the more traditional fee-based online services.