Big Data is the latest technology rage as companies try to avoid drowning in the oceans of information that inundates them daily. As more and more digital content flies around, it is increasingly difficult to store information in traditional databases. Databases make it easier for analysts and executives to get reports about activities and trends. But as less and less information lives in databases, it decreases the ability to spot business patterns and trends or even understand what is happening day to day. On Nov. 1, MarkLogic introduced MarkLogic 5, a major release for its popular software package that focuses on solving Big Data problems.
MarkLogic has embraced the Big Data rage with its latest version. The company’s business focuses on helping organizations access their content assets with an XML database system built to manage large numbers of documents. The company has done well selling to publishing companies and government agencies with lots of content assets. With MarkLogic 5, it is preaching that organizations get smarter answers faster by analyzing structured, unstructured, and semistructured data in the same application.
According to Ken Bado, president and CEO of MarkLogic, “For nearly a decade, MarkLogic has been helping its customers build cost effective Big Data applications that create competitive advantage. That means going beyond big and analytics to make information actionable so organizations can create real value for their business.”
One of the major release points of MarkLogic is the MarkLogic Connector for Hadoop. Hadoop is the open source software that helps solve Big Data problems based on Google’s MapReduce. Google created MapReduce to help solve the problem of analyzing data spread around its thousands of servers. Traditional “big data” solutions were not really created with the problems that Google faced. If organizations think that they currently have data issues, imagine trying to process Google’s log files from servers spread around the entire world. Traditional database approaches could not meet the problems Google faced, so Google invented MapReduce to process disparate log files in order to understand what users were doing.
While MapReduce is proprietary to Google, the concept behind it is what led to the development of Hadoop. Hadoop is an open source version of MapReduce, leveraged by, well, just about every large site but Google. Facebook also has mind-bending Big Data problems based on its hundreds of millions of users who constantly update their pages. Hadoop is currently being used by the top websites, Fortune 500 companies, and even the U.S. government. The USA.gov search engine leverages Hadoop to power some features including Type Ahead Search and Analytics. It also has widespread use inside the intelligence community.
With all of the momentum around Hadoop, it makes sense that MarkLogic would embrace the trend by making it easy to integrate information from Hadoop. MarkLogic’s Edd Patterson states, “This new release definitely adds a whole new dimension to unstructured/semi-structured data. Now if you want to run a real time job in MarkLogic … you can use the Hadoop Connector to take the data from your existing environment (Cloud, RDBMS, Filesystem, etc), and put it seamlessly in MarkLogic 5.0. …”
While MarkLogic has found great success in helping content companies leverage large amounts of content, it will be interesting to see if it can hop on the Big Data train or whether it is going to get run over by the open source approach of Hadoop. How high-quality enterprise software solutions such as MarkLogic navigate and adapt to Big Data and the ever-growing suite of free tools will determine the long-term viability of niche enterprise software vendors.