New Search Engine Promises Relevance, Now
Posted On August 19, 2010
The new search engine, NowRelevant.com, says that it will find, “everything about your subject for the past two weeks.” The name promises relevance, but what does that even mean? Information science researchers have been trying to understand it for over fifty years, and it’s become clear that relevance is in the eye of the beholder. Some beholders may love NowRelevant.com—others may not.
In the very early days of electronic search, the best result was the perfect set of 10 (or 30) documents out of several thousand. To get these, search experts interviewed their clients extensively, canvassed the available data sources, and for each one, wrote complex functions defining exactly what they were looking for. Around 1988, Professor Marcia Bates of UCLA developed her theory of information seeking, “berrypicking” and “evolving” search. In this process, when the questions aren’t answered by the original results set, researchers use the content to adjust their query and go on in somewhat different directions. Even on web search engines millions of times larger than Bates’ examples, we see the same evolving process, as researchers use “Information Foraging” to sample results and choose the most fruitful path.
Efforts to define and test results relevance continued in the 1990s. The U.S. National Institute for Standards and Technology (NIST) set up formal evaluation systems in the Text Retrieval Conference (TREC), comparing algorithms and finding the limits of text-based information retrieval for many kinds of queries. These were still research questions: chemicals, patents, marketing, law, and the queries themselves were complex, whether expressed with Boolean and other operators, or complete sentences.
The rise of the Web and web search engines changed the whole field. Short queries became dominant, many of them trivial rather than significant. Most of the very first web search engines were simple, matching search terms with words in the web pages. The page with the most word matches was considered most relevant, and each page with fewer matches less relevant. The unprecedented size of the web and variety of vocabulary made it very expensive to perform concept extraction and other text analytics.
Then along came Google, which extended the concept of relevance ranking beyond the individual page by seeing it as a node in a network with incoming links that indicate value. Following on work by Ted Nelson and Jon Kleinberg, Google’s “Page Rank” relevance adjustment successfully identified good pages and bumped them to the top of the results listings. With a simple uncluttered interface and other relevance and UI improvements, it soon had the majority of search traffic. Ten years later, Google has evolved the algorithms to create a standardized schema for many results pages, which include a company website, Wikipedia entry, other authoritative pages, Image, News and Twitter search results, and text advertising. Because a huge percentage of searches are short and ambiguous, this variety of results is more likely to be relevant to more people, in one way or another.
For NowRelevant.com, ‘relevance’ means fresh, original, and unspammed, with results limited to items posted in the last 2 weeks (5 days by default). The company’s original service, TheInternetTimeMachine.com (ITM), analyzes high volume search terms and the results to find unsaturated markets. In compiling a research corpus of search terms and results linked to them, ITM scrubbed out spam and trivial pages generated by content factories. While the original ITM service generated structured charts and reports for keywords based on calculations, that has inherent limits, as anyone who has fought with a database knows. Because the data collection is text, the company could create a searchable index, and thus provide an interface for interactive research of interesting topics. From there, it was no huge leap to create a public (beta) interface to the search engine. The company also removed older pages, because these, even company sites and Wikipedia pages, don’t anticipate trends. The content covers zoo stories, tiger masks, and Tiger Woods, while the ad is for a tattoo site (presumably with tiger tattoos).
Comparing NowRelevant.com to Google’s search shows significant differences: for one thing, the NR interface is so stark and simple, that it reminds me of the very first Google page. The results are simple and straightforward, with a timestamp, page title, and text extract. In some cases, it shows Pay-Per-Click advertising, with the option for displaying embedded video that plays on the search results page, only charging when a user clicks to go to the advertising site. The results are all original content and timely, but for a broad search (see screen).
The Google result, with the new limit by date feature set to Past 2 weeks, is very different. It has text ads for several different kinds of tigers, including an ad for the TigerDirect company site, news (mostly sports), images of tigers, a link to a video of tigers, and Tiger Woods.
It’s not clear which results are more relevant. Despite NowRelevant’s claims of less clutter and more value, it all depends on who is doing the searches. A marketer might skim the results and decide to buy ads for funny t-shirts with funny sayings about Tiger Woods. But a random end-user searcher would find more choices at Google (or Bing, Yahoo!, Ask, etc.), despite the textual clutter. As most of us are not marketers, our beholders’ eyes need to know what the options are, in order to decide what is relevant.
Bates, Marcia J. “The Design of Browsing and Berrypicking Techniques for the Online Search Interface.” Online Review 13 (October 1989): 407-424. http://www.gseis.ucla.edu/faculty/bates/berrypicking.html