Quality is one of the most elusive elements that search engines must deliver. Web search engines in particular seek new methods for separating the best documents from the so-so and the misleading ones. Last week, Northern Light announced that it was adding link popularity to the list of factors it considers in producing a ranked list of results. Much like Google or Clever, Northern Light will note the links going to a site as an indicator of research value. This is similar to using a citation search as a method for determining which authors are most influential in a field.Marc Krellenstein, the chief technology officer for Northern Light, was quick to point out that, unlike several search engines that appear to rely exclusively on link popularity to produce their top ranked hits, Northern Light is simply adding this factor as an additional indicator of value. Northern Light also uses the following measures (taken from the company's press release):
- Statistical measures such as query term frequency, inverse document frequency of the term, and length of the document
- Document date
- Word context information such as whether or not the word or phrase queried occurs in the document title
- Document classification based on Northern Light's patented technology for automatically classifying the Web and organizing results into Custom Search Folders
- Natural language analysis of the query, which analyzes the specific syntax and semantics of the user's query
Search engines add a weight for each of these sorts of measures when calculating the relevance ranking of each document. Documents that have more occurrences of the query terms, and more terms appearing close together, or appearing in the title, as well as a more recent date, are given a higher ranking. An overall classification of the document by Northern Light as being about the query subject boosts the rating still higher, and so does the semantic and syntactic context of the query and the document.
By adding link popularity, Northern Light has mined the implicit reviewing mechanism of other Web authors, who link to sites they consider to be outstanding. This should be a good indicator of quality. However, since it is not the only factor used to determine how relevant a document is, it strengthens the other measures instead of replacing them.
Northern Light uses this measure in particular to improve its ability to move the official home page of a company or organization to the top of the ranked list of results. A recent search for Arnold Information Technologies (AIT) indicated that it was effective. AIT turned up at the top of the list, though it had not appeared in any of the results from other search engines. Of course, this may have to do with what each search engine has in its database, as well.
According to Krellenstein: "These new algorithms are intended to mimic the decisions that a person makes when evaluating a results list—balancing various attributes, some of which can even conflict, and coming to a final evaluation about which documents are best. No one methodology alone will ever work for all types of queries."