KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM EContentMag Faulkner Information Services Fulltext Sources Online InfoToday Europe Internet@Schools Intranets Today KMWorld Library Resource Literary Market Place OnlineVideo.net Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research



News & Events > NewsBreaks
Back Index Forward
Twitter RSS Feed
 



Business Intelligence in the Cloudera
by
Posted On March 26, 2009
While the business world may not be ready to throw out their existing Oracle and IBM database installations, a startup called Cloudera (www.cloudera.com) is preparing for that day to come. In mid-March, Cloudera, "the commercial Hadoop company," announced a $5 million round of financing led by Accel Partners (www.accel.com). Cloudera garnered early press attention last year based on the pedigree of its founders that include engineers from Facebook, Google, and Yahoo!. All the engineers had experience creating and using the open source Hadoop (http://hadoop.apache.org) to support managing and mining the data for these massive internet services.

With the press release, Cloudera announced that it is ready to help companies use the same tools that Google, Yahoo!, and Facebook use to store and analyze data. Cloudera now provides commercial support and professional training for Hadoop. According to Mike Olson, CEO of Cloudera, "We believe that Hadoop is a disruptive new technology for mining valuable business information in the enormous streams of new data generated in enterprises today. Processing this kind of big data has been too expensive or too technically difficult for all but the most sophisticated IT organizations until now. Our mission is to use Hadoop to make big data processing capabilities accessible and affordable for all companies." This is very good news for organizations looking for higher-end data warehousing tools at a much lower cost.

What Is Hadoop?

Hadoop, named after a children's elephant, is open source software based on technologies created by Google. Google obviously processes a large amount of data, and it rejected traditional approaches. Its invention for handling the massive amounts of information is called MapReduce. This enables Google to process the massive amounts of information it collects by distributing the files across its infrastructure, which is comprised of COTS (commercial off the shelf) components.

Google released some off the basic concepts in a paper a few years ago, and, shortly afterward, the Hadoop movement began. Yahoo! invested an enormous amount of money in Hadoop and has kept the technology as an open source project. (For a list of applications and organizations using Hadoop, see http://wiki.apache.org/hadoop/PoweredBy.)

Many people credit MapReduce as the innovation that allowed Google to achieve its wild success. Whether or not it intended for Hadoop to become an open source project at the beginning is up for debate, but the company is still supportive of the concept. As search expert Stephen Arnold notes, "Companies built on technology Google contributes to open source can generate a solid revenue stream, but the Hadoop technology is no longer Google's technology, and Google has not been sitting still."

Analyze Your Information the Google/Yahoo!/Facebook Way

Cloudera wants to bring Hadoop into the enterprise world. Hadoop is a complicated solution to a complicated problem. However, given that the amount of data created inside of organizations doubles each year, not just large businesses face huge data warehousing challenges.

If successful, Cloudera will cause massive disruption in the industry, as Fortune 500 companies purchase large data warehousing and business intelligence systems to analyze their information from household names such as IBM and Oracle. These proprietary systems require companies to have their own hardware, software, consultants, and money tree. While Hadoop can run within an existing infrastructure, Cloudera can also help customers store their data in the AmazonEC2 (Elastic Compute Cloud) service to reduce storage costs (http://wiki.apache.org/hadoop/AmazonEC2).

Cloudera Competitors

While Cloudera has an impressive pedigree, it does have competition. Companies such as Aster Data (www.asterdata.com/index.php) and Infobright (www.infobright.com/index.php) offer low-cost, open source data warehousing technologies. Business.com recently announced the availability of CloudBase (http://sourceforge.net/projects/cloudbase), which targets smaller businesses to help them analyze log data without the need to procure large relational database management systems (RDBMS).

What on Earth Does This Mean?

As I look up on the page, I see the number of words marked as "misspelled" equaling the number of words spelled correctly. Put simply, it is important for business managers to know that the technology used by three of the largest internet companies (Facebook, Google, and Yahoo!) is now available for them to analyze their business information.

Even if you have little knowledge in the space, rest assured that alternative solutions are very expensive and are primarily targeted toward the Fortune 500. As a business manager, and not an IT expert, it is important to understand that with companies such as Cloudera, it is now possible to have a high-end data warehouse at a much cheaper cost than traditional enterprise approaches.

The large internet companies went a different direction to solve data problems, and they have open sourced their approaches. If these approaches work for the organizations that handle the most data in the world, then it just may work for your organization.


Erik Arnold is a consultant for Adhere Solutions, Inc., a company that specializes in helping organizations leverage new technologies to maximize efficiency and revenue.

Email Erik Arnold
Comments Add A Comment
Posted By Steve Wooledge3/28/2009 2:24:12 AM

Hi Erik,

Nice work assessing the impact of the MapReduce frameworks such as Hadoop on companies of tomorrow, and thanks for including Aster Data in that.

Correct, the Aster nCluster analytic database is not open source, but provides an enterprise-class In-Database MapReduce framework which is tightly integrated with SQL out of the box - all as a single RDBMS. This allows any company familiar with traditional RDBMSs or BI tools to have the processing power of MapReduce within a familar SQL database/interface for it's data warehouse and advanced analytics without needed to learn how to manage a distributed file system.

There is a webcast here describing the areas where traditional businesses can get value from MapReduce (http://www.youtube.com/watch?v=2zuVT3kzoxA).

We've also described the differences between Hadoop and In-Database MapReduce (http://www.asterdata.com/blog/index.php/2008/09/06/differences-between-aster-and-hadoop/) for more education.

Thanks
Steve Wooledge
Director, Product Marketing
Aster Data Systems
Posted By Mark Windrim3/27/2009 8:55:00 AM

Hi Erik,

Thanks for mentioning Infobright! As your post is mainly geared towards the cloud, I thought I'd point you to a document that we recently released that describes how to install our open source software (ICE) within the Amazon Cloud (see here.) Users of our commercial platform, IEE, will also find these instructions helpful.

Best,

Mark Windrim
VP Community Relations @ Infobright
Posted By Jeff Hammerbacher3/26/2009 7:02:25 PM

Hey Erik,

Thanks for the kind words. A few notes:

* Aster Data's solution is not open source
* CloudBase is built upon Hadoop, so it's complementary to Cloudera's services

Also, Cloudera's Distribution for Hadoop includes a data warehousing framework called Hive (http://hadoop.apache.org/hive) that offers a SQL interface to Hadoop.

Regards,
Jeff

Note: Hammerbacher is VP Product and Chief Scientist, Cloudera


              Back to top