KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools Intranets Today KMWorld Library Resource Literary Market Place OnlineVideo.net Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



News & Events > NewsBreaks
Back Index Forward
Twitter RSS Feed
 



Twitter Search Has Big Ambitions
by
Posted On May 18, 2009

On May 1, Twitter added a search box to the right-side navigation area of its site. The system now indexes all posts (aka Tweets) in near-real time and shows results with the most recent first, just as they are posted-raw and unfiltered. Twitter bought this technology when it acquired the startup Summize. Twitter may also be using the Summize technology to choose the most interesting search terms for the "Trending Topics" feature that shows under the search box. While Twitter has nearly instantaneous results, they may not be relevant or useful. So Twitter has announced plans to expand its coverage to linked pages and to find a way to highlight the more useful links. This will bring it up against Google for breaking news stories, celebrity gossip, and viral music crazes.

As I type, for example, the top Topic relates to comments about today's BBC Question Time (abbreviated to "bbcqt"), updating at the rate of at least 100 Tweets per minute, even though it's nearly midnight in the U.K. The posters are a self-selected group of technophiles because they are on Twitter, and most of their Tweets are rather silly. But there is nearly nothing on the Google News or BBC sites yet. It would have been a unique chance for British politicians or political operatives to participate in the Twitter stream, where they could have addressed the very real concerns of their constituents.

There's no special field on Twitter for tagging, unlike Delicious or Flickr, but Twitter users have evolved a convention of using a # (hash mark) to distinguish a topic name from other uses. For example, the #mumbai hash tag was used by people to tell the world what was going on during the terrorist attack on that city. Hash tags are good for all sorts of current events, from #swineflu and #googlefail (to explain why people couldn't get to their Gmail accounts recently) to #SpaceShuttle. But for some reason, the Trending Topics list shows "Star Trek" without the hash.

Twitter search results may contain profanity, disconcerting weirdness, or search-spam attempts to lure people into clicking on links that are full of advertising and malware. Because the Twitter microblog limit is 140 characters, many people use redirection services, which just make it easier for an unsuspecting user to click on a bad link. In search results (but not normal account reading pages), Twitter has added "expand" links for at least its two main partner services: tinyurl.com and bit.ly, which means it's at least possible to check the URL without going to the page itself.

Earlier this month, a Twitter executive confirmed to Rafe Needleman of CNET that the service will expand its indexing to pages linked in Twitter posts (http://news.cnet.com/twitter-search-to-dive-deeper-rank-results). It's an interesting prospect, especially as the executive, Santosh Jayaram, went to Twitter after being a "manager of search quality operations" at Google. I believe the idea is to expand Twitter's results to a social search and collaborative filtering approach, linking to the most useful external pages on topics, such as the best bicycle repair in San Francisco and the actress Rachel Weisz's recent interviews.

ReadWriteWeb suggests that bit.ly will do this work for Twitter (www.readwriteweb.com/archives/three_reasons_why_twitter_will_not_index_the_links.php). It already has a search option that seems to index and link directly to the page and the Twitter account but not the original Twitter post (http://bit.ly/app/search). It shows the full URL and extracts the first readable text, but it sorts simply by date. It isn't limited to pages linked by Twitter, as the shortened links can be used on any blog, and links to PDF files work properly. bit.ly is using OpenCalais (www.opencalais.com) from Thompson Reuters to do semantic analysis on pages linked, but that's not appearing now (except perhaps when the page is lacking metadata) and is showing some of that in the search results. bit.ly also has invested significant resources in spam detection and attempts to avoid linking to suspect sites. But a fast spammer could evade those controls and grab a lot of traffic-I've already seen affiliate marketing sites that sure look like pyramid schemes in search results.

Unfortunately, bit.ly is stuck with many of the same problems other startup search engines have, such as spelling errors and homographs (words spelled the same but with different meanings, such as "bank"). In the results pages, there are untitled pages, truncated titles, navigation text instead of meaningful text at the top of pages (it doesn't seem to use match-word in context, which is a shame), and duplicate pages.

Needleman says, "Twitter Search will also get a ‘reputation' ranking system soon, Jayaram told me. When you do a search on a ‘trending' topic-a topic that is so big it gets its own link in the Twitter.com sidebar-Twitter will take into account the reputation of the person who wrote each tweet and rank the search results in part based on that. Jayaram did not say precisely how reputation will be calculated; he indicated that engineers are still figuring that out."

Reputation ranking ranges from Google PageRank to eBay's feedback listings to Yelp. In Twitter's case, it's likely to calculate an account's reputation from the number of followers and the frequency with which the followers "re-Tweet" the account's post. The idea is to use the collective choices to identify the best of anything. All of the reputation ranks above are frequently abused to wrongly promote some items or people over those which are more truly relevant. Slashdot has a system of moderating the moderators (meta-moderation), which seems to keep the more obvious abuses under control.

All the web search engines-Google, Yahoo!, MSN Live Search, and Ask-have gigantic indexes and years of click-through data and algorithm tweaking. Twitter is unlikely to be the best place to look for any kind of authoritative answers. Wikipedia is better for a simple introduction, and web search is better for shopping and research. But Twitter search is faster than news and blog searches; it's nearly instantaneous and that can be perfect-sometimes.


Avi Rappoport is available for search engine consulting on both small and large projects.  She is also the editor of www.searchtools.com.

Email Avi Rappoport
Comments Add A Comment
Posted By Chris Issack5/20/2009 10:17:14 PM

Regarding Twitter's "Reputation" calculation, there are several methods that could be used to weed out the Twitter spammers. See Eric Ward's article
"How A Twitter Reputation Algorithm Needs To Work"
at
http://searchengineland.com/how-a-twitter-reputation-algorithm-needs-to-work-19017 - or if the URL above isn't working - http://bit.ly/9ILzl
Posted By Ben Stein5/18/2009 4:03:38 AM

Avi - Interesting post!
I think there's a great opportunity of applying semantic technologies (like ContextIn - http://www.urlclassifier.com API) for utilizing the mass twitter data for better online search.

Stein is with ContextIn--Ed.

              Back to top