Search engines still matter, but Wolfram Alpha (www.wolframalpha.com) hopes to shift us toward a "finding engine" that uses Web 3.0 principles to deliver information from natural language queries. Wolfram Alpha is the latest big project from Wolfram Research, Inc. Marketed as a computational knowledge engine, it boils down to being a really smart way to access most of the best reference shelves on the planet. Think about it as a library minus the disdain for skateboarders plus Wolfram's Mathematica program powering the world's biggest online scientific calculator.Stephen Wolfram demonstrated the new engine at the Berkman Center at Harvard Law School (http://cyber.law.harvard.edu/events/2009/04/wolfram) on April 28. Despite a sketchy webcast (the event was so popular that the Berkman folks ran out of bandwidth), Wolfram's demo has been widely blogged and linked to online. Screen shots from the demo show an easy and intuitive interface that converts queries such as "weather oakland" into current and historical weather data for Oakland, Calif. My testing since its debut on May 15 bears this out.
Wolfram Alpha relies on the data sets it has acquired (much of which are freely available from governments and public domain sources) and computers in data centers maintained by Wolfram Research. With all that power, the engine can interpret keywords such as weather and oakland into meaningful categories by way of "input interpretation." The term "weather" remains weather, but the term "oakland" is interpreted to mean "Oakland, California" by default, according to rules set by the program. Heavily symbolic and mathematical queries are interpreted and computed even more handily: "water 550C 3 atm," another sample search Wolfram demonstrated, becomes the substance water at 550° centigrade and at "3 atmospheres" of pressure. Alpha can then tell the user useful facts about the nature of water under such conditions, including density, molecular weight, and boiling point. And all this is presented in easy-to-read, aesthetically pleasing boxes with a dignified non-san-serif font and lots of white space.
The Wolfram Alpha engine will even try cultural abstractions such as "uncle's uncle's brother's son," and tell you what it knows about how such relationships are described in the kinship models developed by anthropologists.
This brings up some heady questions, and the audience at the Berkman Center hit Mr. Wolfram with serious epistemological concerns. Cultural anthropologists, for example, are not in universal agreement about the nature of kinship; to rely on a prevalent model of kinship does not necessarily get us any closer to the true nature of the relationship signified by "uncle's uncle's brother's son." Should the Wolfram team lean on Ward Goodenough's ideas to the exclusion of, say, Clifford Geertz? And, if so, why?
Rudy Rucker's report on Wolfram Alpha for h+ Magazine (April 6, 2009, www.hplusmagazine.com/articles/ai/wolframalpha-searching-truth) quotes Stephen Wolfram as estimating about 90% of all major reference materials are now in the Alpha engine. Rucker points out that no matter how much data is in Alpha, "it's also necessary to talk to experts." That's an understatement. And it may be at this very point, where Rucker suggests that Wolfram's semicelebrity status in the science world helps him and Wolfram Alpha to find such experts, that the plan for the program actually suffers a serious blow.
The previous example from mid-20th century cultural anthropology should hint at the depth of the problem: It will not be possible for any computer science team to "talk to" all of the most important thinkers in all of the most important fields in all of the most important branches of human knowledge to any sufficient level of sophistication. Criminology, climatology, astronomy, history, semiotics, information science, marine biology, linguistics, sociology, carpentry, xenopolitics: Each has lifetimes of serious subtleties to pay attention to. So gaps in this "knowledge engine" are inevitable, despite Wolfram's plea for expert participation (www.wolframalpha.com/participate/participate.html).
David Talbot (MIT, Technology Review) took this down to a practical level when he compared search results from Wolfram Alpha to Google for a variety of terms (he had temporary access ahead of Wolfram's Berman demo).
He searched for geographical terms (Utah, Florida, population) and social/medical info (cancer, New York), and, in his own words, he didn't "use search terms that clearly had no computable answer (and therefore would have stumped Wolfram)." But neither did he "throw any softballs in areas close to the heart of its makers: physics, chemistry, engineering, and genomics." His report has Google and Alpha at a close tie-one sometimes edging out the other in guessing the kinds of results Talbot wanted (read his entire report at www.technologyreview.com/web/22585/?nlid=2001).
Talbot says, "On hard-core scientific questions, it gives you tons of symbols and graphics and other information that would be useful to a researcher but obscure to most people. But on many common questions for which there is no obvious data element, you will not get much help."
On the evening of May 15, the Wolfram Alpha engine went live, and I had a chance to try it out myself. Some initial glitches made the first 15 minutes very slow, and some queries never went through at all (2 days later it was still very slow). Eventually, I was pulling weather records for Hattiesburg, Miss., trying a vanity search (Alpha had nothing on me, but they had plenty on the popularity rankings of my names over the past 70 years), and putting in feeble attempts at algebraic formulas and getting back line slope graphs whenever my query made enough mathematical sense to plot on an xy axis.
Wolfram Alpha is only as good as its data and the ability of its programmers to give that data context and meaning in logical semantic categories. If it is understood as an extremely powerful, elegant, and interactive almanac, the interpretation of the "truth" (a subject Rucker is much concerned with in the h+ piece) of the information it returns on a query depends on the agents who decide what sources the engine will use and how it will use them. So not only is it important for the human players at Wolfram Research to be sure of the quality of their sources, it's also vital that they think like us, the users, and build Wolfram Alpha so that it returns the kind of stuff we (the general public, not just postdoctorate lab rats) want to find.
At the bottom of the results page for noncomputational results (that is almanac, encyclopedia, or other reference type results), there's a link for "Sources"-a step in the right direction, though they fudge a bit on specifics: "This list is intended as a guide to sources of further information. The inclusion of an item in this list does not necessarily mean that its content was used as the basis for any specific Wolfram|Alpha result ... Requests by researchers for detailed information on the sources for individual Wolfram|Alpha results can be directed here. ... " This links to http://www13.wolframalpha.com/input/alphaInfoRequest.jsp. As of May 18, that URL wasn't working.
A "computational knowledge engine" that aims for general excellence will have to pull from the best data sources it can get; and to get info pros on board, Wolfram Alpha might want to tidy up this sourcing issue (uh, I think I got it from Google ... I mean Wolfram Alpha. Or, like, the internet or somethin'...). It will also have to put the epistemology concerns (how can we know what we know about what we know about kinship?) to one side for a time.
To help solve the stickier questions about which interpretation of the data gets pushed to the top, Wolfram Alpha may find that its own widgets open a door for user participation that would help settle debates in more of a MediaWiki-politics-style. Wolfram Alpha's main interface will be free online, with some special computational tools associated with the engine available separately to specialists for a price. The free web version will offer the APIs and widgets that individual users can embed (and perhaps modify-that isn't yet clear) to help improve the engine's results. If its users are allowed this kind of input, maybe we can at least settle on "truths" we can live with.
But even if any "ultimate truth" is never reached, these tools could provide semidemocratic means to demonstrate and debate principles that at least let us all civilly disagree. And it might be in that process that we'll come closer to consilience.