It may be too early to declare “big data” the buzzword—or phrase, as the case may be—of 2012, but that has not stopped companies from trying to conquer the challenges posed by this trendy phrase. IDC estimates the market for big data technology and services will grow at an annual rate of nearly 40% to reach $16.9 billion by 2015.
On April 25, 2012, when IBM announced an agreement to acquire Vivísimo, a provider of federated discovery and navigation software, the company said, “Today’s news accelerates IBM’s big data analytics initiatives with advanced federated capabilities allowing organizations to access, navigate, and analyze the full variety, velocity, and volume of structured and unstructured data without having to move it.” But not everyone was so sure this move positioned IBM any better when it comes to handling massive amounts of unstructured information—or are even entirely sure that the idea of big data isn’t a boondoggle.
Lynda Moulton, senior analyst and consultant at Outsell’s Gilbane Services, says, “I share a view with a lot of other content specialists and analysts that ‘big data’ is simply a label assigned by marketing people recently (2-3 years) to emphasize the ability of their software application to handle really large quantities of mostly unstructured content. There is no official size assigned to the term.”
“Companies are doing anything and everything to ride the ‘big data’ buzzword,” says Stephen Arnold, managing partner of ArnoldIT.com. “Search lacks zip. Federation means nothing to most people. Deduplication is just not understood by anyone except an expert searcher or data quality specialist.”
In other words, no one is exactly sure what big data is. But why all this focus on a concept that some people find dubious? “My belief is that there is almost a contrarian incentive for manufacturers of hardware to want software designed to efficiently process ‘big data’ because it could easily compromise sales of storage and memory,” says Moulton. “There is undoubtedly significant strategic thinking that goes on in an organization like IBM on whether to emphasize data processing efficiency of software applications for big data to mitigate the need for extra hardware. Perhaps there is an intensifying shift in focus from hardware to software because other new search engine players are differentiating themselves as being able to handle bigger data on a smaller hardware platform.”
For its part, IBM reports,“The combination of IBM’s big data analytics capabilities with Vivísimo software will further IBM’s efforts to automate the flow of data into business analytics applications, helping clients better understand consumer behavior, manage customer churn and network performance, detect fraud in real-time, and perform data-intensive marketing campaigns.” It seems IBM is looking to the search engine’s federated search capabilities to help tame the problem of information overload.
Arnold says, “Vivísimo may provide a way to federate content from multiple sources, but I struggle with the ‘big data’ concept. Vivísimo technology has not been engineered to handle Google scale content or the Twitter message stream in real time in my opinion. Engineering work may be needed. Parts of Vivísimo set up require manipulating scripts which can be tricky.”
Of course, when you’re IBM, a little touch-up work on a search engine is hardly an insurmountable obstacle. Or as Arnold puts it, “IBM, given enough money, can make anything work.”
Moulton thinks Vivísimo has plenty to recommend it when it comes to taming enormous amounts of data. She points to its work with the data produced by American bureaucracy. “Vivísimo stepped up to the challenge of indexing, and federating, the majority of government websites (40 million documents) a number of years ago in a matter of months, using their Clusty platform,” she says. “This project (USA.gov) was a challenge and replacement to a similar FAST effort that had been unconsummated over a period of years. There is no public record that I know of that documents why Microsoft’s Bing won the contract to replace Vivísimo after a couple of years but I assume that political clout was part of it. Vivísimo’s solution worked well, out-of-the-box, and the current USA.gov search engine is sub-standard in comparison.”
How did Vivísimo conquer the epic volumes of government data? Well, Moulton says it has put plenty of time, effort, and money into research and development in a variety of areas: rapid indexing of very large domains; rapid deployment; auto-classification; text handling (normalizing terms to a standard vocabulary through tools to easily define synonyms) and entity extraction; administration tools; an intuitive interface design for the users to customize/personalize their search preferences.
“Vivísimo’s weakness was sales and marketing, an IBM strength, and perhaps depth of support operations—another IBM strength. When Vivísimo got an infusion of venture capital to put into sales and marketing they abandoned their search messaging, de-emphasized search—which probably compromised sales in the pipeline—and looked like they weren’t really in the search engine business any longer,” says Moulton. “This probably confused buyers. I don’t know for sure, but they may not have built up a sophisticated support operation sufficient to handle a growing customer base soon enough. They could have done a lot more to play to their technological and design strengths by not overreaching too early on the pricing front. Having a larger customer base earlier is always desirable and their pricing was out of the reach of most SMBs.”
This confusing messaging may have been the reason why Vivísimo stayed unclaimed while other search vendors were gobbled up. In an article on FierceContentManagement, Louis Tetu, CEO of Coveo, said, “The ongoing string of acquisitions in this space shows just how important the marketplace views solutions that generate insight from the vast streams of structured and unstructured data stored in enterprises, in social media, and more and more often, in the cloud.”
Moulton says Vivísimo didn’t get snapped up earlier because of a “loss of market presence in enterprise search—brand got diluted with confusing messaging—and not enough customers to make it attractive to acquire for the customer base.”
Now that Vivísimo is part of the IBM family of products, what advice do the experts have to make this a big data success story? “IBM would do well to standardize all their search technology on the Vivísimo technology and keep building it out,” says Moulton. “With the right packaging strategy IBM can deliver to many audiences… If the platform is the same across offerings, they then have an easy migration path for any growing enterprise.”