The new Greplin service is like desktop search, only it indexes and makes searchable online social networking accounts—what some people are calling a “personal cloud.” The free version indexes content from accounts on Twitter, Facebook, Gmail, Google Docs & Calendar, Dropbox, and LinkedIn, while the paid versions add other sources and more index space. Simple to install, Greplin works quickly, not just finding one’s own posts, email, and documents scattered around the various services, but posts and documents shared by friends’ accounts.The Greplin service is cloud-hosted, so there is no software to install or files to track, just a web site to log into from any computer or phone. Greplin provides a secure (https) web page to search and show results, so there’s no question of anyone seeing the search. The results are presented by kinds of materials, such as streams, messages, people, events, and files, with optional filters for the source and item type. Even searching within PDF files on Dropbox and Google Docs is easy.
And, it is fast! Type a couple of letters and the results show up in less than a second. Add another letter, and the changes show up instantly. In most cases, a yellow highlight shows the match words, so it is easy to tell why a particular item shows up in the results. The presentation is clear with graphic icons indicating the source and user icons from Facebook and Twitter. At the end of each section, there’s an option to get more messages, streams, people, but only if there are more of that type. The page is a nice implementation of a federated search “segmented results” design pattern.
NOTE: The default page is heavy on the JavaScript, rather like GMail. There’s a simpler page for “dumb” phones, though it has JavaScript as well. At the moment, there’s no error message about this requirement so it fails silently, which could be a problem for people with old browsers or security worries. Daniel Gross, the founder of Greplin, says that they plan to include a <noscript> tag to notify non-JavaScript users.
Disconcertingly, the Greplin results are a mix of public and private, new and old content: results for a test search of my son’s high school charity fundraiser from 4 years ago, a business contract email, and yesterday’s local weather notes. Searches find everything posted on the account’s Twitter Timeline (the default view, all accounts that are followed), Twitter direct messages, Gmail public lists and private messages, and both public and friends-locked Facebook posts. I found the mix of privacy levels surprising, and wish they had at least flagged the private ones.
How It Works
To give Greplin access to an account’s messages and friends, the system uses the Facebook Connect system and for other services, the open protocol “OAuth.” This makes the process of setting up incredibly easy and quite secure: selecting a service sends the browser to that service to authorize Greplin, and it’s just a matter of saying yes.
The search engine itself is based on the open-source Lucene core functions, though it has been customized for this use. In particular, Greplin’s search automatically performs left truncation wildcard searches, so starting with “lib” will match items with the words “library,” “liberty,” and “Libya,” but not “alibi” or “glib.” While it is not the way Google and other web-scale search engines generally work, it is very fast and the words are highlighted, so it is very clear what’s going on.
The core search engine Lucene can scale up to hundreds of millions of documents; it is used at LinkedIn, Digg, Netflix, and Yelp. And, because Greplin’s implementation is in the Cloud, currently running on Amazon cloud services, the company can simply add more servers for more people, pointing each new group of users to a new server. Web and enterprise search engines can’t do this, because the relevance depends on the document frequency in the whole index: that’s where complex distribution and sharding architectures are required. Updates to the indexes should be near-real-time to 20 minutes, though the company indicates in the FAQ that it may be up to a day. In my 3-day test, adding a new source was fast, but it did not remove the mailing list spam that I marked in Gmail.
Shared and Institutional Use
One great use of Greplin would be for shared and institutional accounts, like those of small business, local libraries, and non-profits. Actually, given the pace of institutional change, I suspect that large corporations also have trouble keeping track of their social networking. It would be very helpful for checking public statements and achieving consistency in their statements.
Privacy Issues
For personal use, however, privacy and security are more of an issue. Using OAuth and Facebook Connect mean that Greplin does not store user names and passwords for other sites, which is good, because even if they are hacked, those other accounts are not vulnerable, though all the indexed information may be exposed. Gross says, “We never plan on selling any personally identifying information. Our users’ privacy is of paramount importance to us.” This is also articulated in their written policy. However, it doesn’t say anything about aggregate searches or trends, impersonal data that many companies do mine and market. It is a little reassuring to know that Greplin will delete all information from the index when needed, within 20 minutes.
Social Networking Fairy Tale
Greplin has a fun story: a 19-year-old founder who left Israel before his army service, a place at the Y Combinator business incubator, a failed previous project, and a spark of inspiration: “He was on his way to a party, and he didn’t remember where the address was stored. Was it a Facebook event, or in an email, or in his calendar? It was a pain to try searching all these things from his phone.” So, he built it. (Y Combinator founder Paul Graham, quoted in TechCrunch article. Greplin has already open-sourced a new intervals feature for Lucene, and Gross says that they will share more in the future, which gives standing in the open-source world. In mid-February 2011, Greplin received $4 million in funding from venture capital firm Sequoia, and opened its service to the world. It garnered wide coverage in business publications and web sites, and is, in essence, a social networking fairy tale.
@skypen (Fabio Gratton) on Twitter says: “Searches Facebook better than Facebook, LinkedIn better than LinkedIn, Twitter better than Twitter. Wow.”
A Happy Ending is Not Assured
Despite everything, there is a huge hole in the business plan—the service is too easy to copy. None of the technologies is particularly new or revolutionary, and there are thousands of software developers who can (and likely will) put the same kinds of pieces together: OAuth, Facebook Connect, Lucene/Solr/other scalable search engine, and a simple interface. Others probably wouldn’t work quite as well, but they would have the freedom to change interfaces, create smartphone apps, adjust the relevance rules, and connect to other sources, including desktop search, enterprise search, and web search.
These future competitors might well include Microsoft, Apple, and especially Google, who would all love to be the portal to everyone’s personal cloud data. But, there’s a big obstacle, which is that Facebook would love to be that portal as well, and might not give access to a strong competitor. It is going to be a wild ride for Greplin, I’m sure.
Greplin Versions
Free Version: Twitter, Facebook, Gmail, Google Docs & Calendar, Dropbox (including attached PDF files), and LinkedIn, with 200 MB of index storage (mailing list archives fill it up fast)
Premium: adds EverNote, Yammer, and Google Apps: Docs, Calendar, and Mail. $5 per month includes 500 MB; $15 per month includes 2 GB of storage.
Planned enhancements: Dropbox attachments, Salesforce, Box.net, Basecamp, and Google Voice.
Possible enhancements: Pinboard, Instapaper, Tumblr, Wikipedia, To-do list sites, etc.