Big Data is hot, hot, hot! Companies such as Google, IBM, Oracle, Netflix, Amazon, Facebook, and Microsoft are only a few that see Big Data not only as a gold mine of information but as a competitive necessity required to create new products; analyze, understand, and better serve the needs of customers; spot trends; look for patterns; and gain insights previously too expensive to examine.
Big Data is transforming competitive opportunities in just about every industry sector including banking, insurance, manufacturing, retail, wholesale, healthcare, communications, transportation, construction, utilities, and education. It also plays important roles in retail operations such as marketing, merchandising, operations, supply chain, and new business models. It is becoming quite clear that companies that fail to use their data effectively are at a significant competitive disadvantage from those that can analyze and act on their data. Savvy companies are jumping onto the Big Data bandwagon and hiring “data scientists” in droves to take advantage of the zettabytes of data and the billions of dollars of revenue generated by Big Data.
Not to be left out of the new opportunities in Big Data, the Obama administration, on March 29, announced its own Big Data Research and Development Initiative, a multidepartment commitment by numerous federal agencies seeking to improve “our ability to extract knowledge and insights from large and complex collections of digital data.” John P. Holdren, assistant to the president and director of the White House Office of Science and Technology Policy (OSTP), stated, “In the same way that past Federal investments in information-technology R&D led to dramatic advances in supercomputing and the creation of the Internet, the initiative we are launching today promises to transform our ability to use Big Data for scientific discovery, environmental and biomedical research, education, and national security.”
At the event, Holdren pronounced that “big data is indeed a big deal.” But what is Big Data? In a report published by the McKinsey Global Institute in May 2011 titled “Big Data: The Next Frontier for Innovation, Competition, and Productivity,” McKinsey defined Big Data as “datasets whose size is beyond the ability of typical database software tools to capture, store, manage, and analyze.” Looking at it another way, Edd Dumbill of O’Reilly Radar defines Big Data as “data that exceeds the processing capacity of conventional database systems. The data is too big, moves too fast, or doesn’t fit the strictures of your database architectures. To gain value from this data, you must choose an alternative way to process it.” We are not just talking about gigabytes of data but terabytes and petabytes—the amount of data generated on a single day is so large as to be almost incomprehensible.
McKinsey Global Institute, in the same report, “estimates that enterprises globally stored more than 7 exabytes of new data on disk drives in 2010, while consumers stored more than 6 exabytes of new data on devices such as PCs and notebooks. One exabyte of data is the equivalent of more than 4,000 times the information stored in the US Library of Congress.” In fact, according to the report, “Digital data is now everywhere—in every sector, in every economy, in every organization and user of digital technology” and by capturing, organizing, managing, and analyzing the data, McKinsey predicts that “we are on the cusp of a tremendous wave of innovation, productivity, and growth, as well as new modes of competition and value capture—all driven by big data as consumers, companies, and economic sectors exploit its potential.” McKinsey estimates that in healthcare alone. the U.S. “could use big data creatively and effectively to drive efficiency and quality, we estimate that the potential value from data in the sector could be more than $300 billion in value every year, two-thirds of which would be in the form of reducing national health care expenditures by about 8 percent.”
In a President’s Council of Advisors on Science and Technology (PCAST) report released in December 2010, the advisory group outlined three reasons why the government needs to make Big Data a priority—namely its growing importance to the national and global economy, its role in accelerating the pace of discovery of new knowledge in science and engineering, and helping to find solutions to the large challenges that face the country, especially in the areas of national security, healthcare, environment, and education. The Obama administration clearly recognizes that the private sector (industries, research universities, and nonprofits) will take the lead in using the potentials of Big Data to make new IT products and services, to boost productivity, and to gain new insights, knowledge, and understanding in the domains of science and engineering. In fact, Holdren’s prepared remarks at the Big Data Initiative announcement challenged industry, universities, and nonprofits to join the government in working on Big Data projects. As Holdren remarked, the federal government cannot do this alone—we need an “all hands on deck” approach. A collaborative approach between government and industries, research universities, and nonprofits will be better equipped to take advantage of the huge potential that Big Data has to offer in “moving from data to knowledge to action.”
Tom Kalil, deputy director for policy in the Office of Science and Technology Policy, on the OSTP blog, acknowledged that many companies are sponsoring Big Data competitions (see Kaggle.com for a list of active competitions) and universities such as Stanford are offering free online courses to prepare the next generation of “data scientists.” McKinsey estimates that “the United States alone faces a shortage of 140,000 to 190,000 people with deep analytical skills as well as 1.5 million managers and analysts to analyze big data and make decisions based on their findings. The shortage of talent is just beginning.” Nonprofits such as Data Without Borders are also “helping non-profits by providing pro bono data collection, analysis, and visualization” tools that assist nonprofits, social organizations, and local governments with their Big Data challenges.
The Obama administration believes that the federal government can make Big Data easier for the private sector by advancing “state-of-the-art core technologies needed to collect, store, preserve, manage, analyze, and share huge quantities of data”; harnessing “these technologies to accelerate the pace of discovery in science and engineering, strengthen our national security, and transform teaching and learning”; and expanding “the workforce needed to develop and use Big Data technologies.” The Big Data Initiative is a direct response to the recommendations of the PCAST report that “concluded that the Federal Government is under-investing in technologies related to Big Data.”
The administration has released a Big Data fact sheet that showcases more than 80 projects offered by federal agencies and departments including the National Science Foundation, National Institutes of Health, Department of Defense, Defense Advanced Research Projects Agency, Department of Energy, and the US Geological Survey. The projects “together, promise to greatly improve the tools and techniques needed to access, organize, and glean discoveries from huge volumes of digital data.”
Big Data is indeed a big deal.