Last week, Microsoft announced a major upgrade to the new search engine it has been testing since March. It has moved its Windows Live Search and Live.com out of beta status and said that Live Search will power the search capability on MSN, the company's news and entertainment portal. A new feature is the Related Search function, which is designed to help users refine a query by simply clicking on a list of related terms. The unusually low-key and minimalist press announcement generated little excitement. After some poking around, Information Today, Inc. learned from search expert Stephen E. Arnold that Microsoft has even more potent technology ready to deploy.
Unlike the upgrade to Live.com, which, according to a Microsoft spokesperson, just uses algorithms that mine previously submitted queries to the engine, the new and unannounced search system brings faceted search to a Microsoft application. Try it yourself at http://rwsm.directtaps.net. The Microsoft project, called Search Results Clustering (SRC), currently offers a search beta and downloadable toolbar.
What Microsoft is doing is called text mining. This is jargon for discovering people, places, things, and other facts from text. These facts are then organized so a user can point and click on a category and see the related information. The approach is the secret sauce for such companies as Exalead in Paris and Endeca in Boston.
Arnold, who is the author of Enterprise Search Report, 3rd edition, and the forthcoming Text Mining Report, said: "If Microsoft makes this function part of SharePoint, it will pose a serious threat to companies offering SharePoint-specific search enhancements and be a strong competitive challenge to Google and its Appliance and OneBox API. If Microsoft puts this technology in Live.com, that service will almost certainly see an increase in traffic. Microsoft had to do something, and this Vivisimo-like clustering may be one of Microsoft's most significant advances yet."
(Note: The 300-plus page Text Mining Report should be available by late October or early November, according to Arnold. It will provide profiles of 18 companies, business cases, a road map, a glossary, etc. Both are available from CMS Watch at http://www.cmswatch.com.)
According to Arnold, given the economies of scale that Microsoft has, its lower cost systems with embedded search may be priced as much as 90 percent less than text mining systems sold by competitors like Autonomy plc and FAST Search & Transfer. This will put further price pressure on enterprise search sector leaders trying to get top dollar for their enterprise platforms.
Implementing text mining in SharePoint will bring text analytics to a far wider market of business users than text analytics has previously enjoyed. Putting the faceted search into Live.com would add some sizzle to a search system that lags behind Google's and Yahoo!'s offerings.
Behind the Scenes
The technology work being done at Microsoft Research Asia's Search Technology Center (http://research.microsoft.com/stc) is a story that Microsoft hasn't yet announced to the world. Microsoft Research (MSR) Asia and MSN Search partnered to create the Search Technology Center (STC) in Beijing. Launched in October 2005, the center is "dedicated to advancing the state-of-the-art in search technology and delivering a more intelligent and powerful search experience to MSN users around the world." One of its core projects is Search Results Clustering (SRC). E-mail with a representative of the STC confirmed that it is collaborating with the Live Search product group on this technology.
According to information on the site, the SRC technique does on-the-fly clustering of a search engine's results "into different groups and provides meaningful and readable names for these groups. SRC changes the traditional representation of search results into a non-linear way, so as to facilitate user's browsing."
The site further explains: "Traditional clustering techniques don't work for this problem because the documents are short, the cluster names should be readable and the algorithm should be efficient for on-the-fly calculation. Our method[s] take the whole problem in another way and overcome the difficulties in traditional clustering method[s]. Basically, we try to first identify salient topics by identifying distinct and independent keyword[s], and then classify the search results into these topics."
The following is the corresponding paper to this technology: Hua-Jun Zeng, Qi-Cai He, Zheng Chen, Wei-Ying Ma, and Jinwen Ma. "Learning to Cluster Web Search Results." In Proceedings of the 27th Annual International Conference on Research and Development in Information Retrieval (SIGIR'04), pp. 210-217, Sheffield, U.K., July 2004. (Note: the link given on the site doesn't work, but the paper is available at http://research.microsoft.com/asia/dload_files/group/wsm/2004/19.pdf.)
In May 2006, Microsoft executives detailed the company's efforts to create unified enterprise information management solutions by extending its existing programs. The company said that "new capabilities in Windows Live Search will provide a single point of entry and user interface to unify multiple search solutions."Microsoft currently has desktop and enterprise search add-ons for business customers, including Windows Desktop Search for Windows XP, Windows Desktop Search for Enterprises, as well as the built-in search capabilities of SharePoint Server. And, various other companies have provided add-on search and clustering solutions that have been pecking away at Microsoft's domination. Now, Microsoft is poised to introduce its own killer search solution.