While the business world may not be ready to throw out their existing Oracle and IBM database installations, a startup called Cloudera (www.cloudera.com) is preparing for that day to come. In mid-March, Cloudera, "the commercial Hadoop company," announced a $5 million round of financing led by Accel Partners (www.accel.com). Cloudera garnered early press attention last year based on the pedigree of its founders that include engineers from Facebook, Google, and Yahoo!. All the engineers had experience creating and using the open source Hadoop (http://hadoop.apache.org) to support managing and mining the data for these massive internet services.With the press release, Cloudera announced that it is ready to help companies use the same tools that Google, Yahoo!, and Facebook use to store and analyze data. Cloudera now provides commercial support and professional training for Hadoop. According to Mike Olson, CEO of Cloudera, "We believe that Hadoop is a disruptive new technology for mining valuable business information in the enormous streams of new data generated in enterprises today. Processing this kind of big data has been too expensive or too technically difficult for all but the most sophisticated IT organizations until now. Our mission is to use Hadoop to make big data processing capabilities accessible and affordable for all companies." This is very good news for organizations looking for higher-end data warehousing tools at a much lower cost.
What Is Hadoop?
Hadoop, named after a children's elephant, is open source software based on technologies created by Google. Google obviously processes a large amount of data, and it rejected traditional approaches. Its invention for handling the massive amounts of information is called MapReduce. This enables Google to process the massive amounts of information it collects by distributing the files across its infrastructure, which is comprised of COTS (commercial off the shelf) components.
Google released some off the basic concepts in a paper a few years ago, and, shortly afterward, the Hadoop movement began. Yahoo! invested an enormous amount of money in Hadoop and has kept the technology as an open source project. (For a list of applications and organizations using Hadoop, see http://wiki.apache.org/hadoop/PoweredBy.)
Many people credit MapReduce as the innovation that allowed Google to achieve its wild success. Whether or not it intended for Hadoop to become an open source project at the beginning is up for debate, but the company is still supportive of the concept. As search expert Stephen Arnold notes, "Companies built on technology Google contributes to open source can generate a solid revenue stream, but the Hadoop technology is no longer Google's technology, and Google has not been sitting still."
Analyze Your Information the Google/Yahoo!/Facebook Way
Cloudera wants to bring Hadoop into the enterprise world. Hadoop is a complicated solution to a complicated problem. However, given that the amount of data created inside of organizations doubles each year, not just large businesses face huge data warehousing challenges.
If successful, Cloudera will cause massive disruption in the industry, as Fortune 500 companies purchase large data warehousing and business intelligence systems to analyze their information from household names such as IBM and Oracle. These proprietary systems require companies to have their own hardware, software, consultants, and money tree. While Hadoop can run within an existing infrastructure, Cloudera can also help customers store their data in the AmazonEC2 (Elastic Compute Cloud) service to reduce storage costs (http://wiki.apache.org/hadoop/AmazonEC2).
Cloudera Competitors
While Cloudera has an impressive pedigree, it does have competition. Companies such as Aster Data (www.asterdata.com/index.php) and Infobright (www.infobright.com/index.php) offer low-cost, open source data warehousing technologies. Business.com recently announced the availability of CloudBase (http://sourceforge.net/projects/cloudbase), which targets smaller businesses to help them analyze log data without the need to procure large relational database management systems (RDBMS).
What on Earth Does This Mean?
As I look up on the page, I see the number of words marked as "misspelled" equaling the number of words spelled correctly. Put simply, it is important for business managers to know that the technology used by three of the largest internet companies (Facebook, Google, and Yahoo!) is now available for them to analyze their business information.
Even if you have little knowledge in the space, rest assured that alternative solutions are very expensive and are primarily targeted toward the Fortune 500. As a business manager, and not an IT expert, it is important to understand that with companies such as Cloudera, it is now possible to have a high-end data warehouse at a much cheaper cost than traditional enterprise approaches.
The large internet companies went a different direction to solve data problems, and they have open sourced their approaches. If these approaches work for the organizations that handle the most data in the world, then it just may work for your organization.