Deloitte provided funding as well as real-world input on how researchers and the business sector use government datasets. The company notes that the resulting Data USA database and website “transforms data-driven insights into easy-to-interpret visualizations that can answer your questions in seconds … [and] fills in the gaps by illustrating patterns, reading signals, and identifying trends in public data. … Data USA users can browse the data using filters (locations, industries, occupations, education) or target their view using search tools. … Data USA also delivers narratives on topics of interest and issues that matter to government and business leaders. … The code is open source, and the platform is scalable, allowing for new data to be added.”
Seeing Information in Data
Data USA uses sophisticated algorithms to help users find information relevant to their queries, based on assumptions about both the user and the data programs in the system. Kris Hammond, a computer science professor at Northwestern University, refers to the initiative as being “driven by the idea that we can actually figure out what a user is going to want to know when they are looking at a data set.”
Anyone can glean key information quickly. “Data USA adopts the philosophy that one need not be a data scientist or a programmer to access valuable and versatile public information. It provides access for those unfamiliar with data manipulation, while maintaining breadth and depth for the seasoned professional,” Datawheel notes.
Background images are all licensed by Creative Commons and help make the pages of data more satisfying to the reader. Once users do a first-level search, they are prompted with various options to refine the search—and they always see pertinent data at each step. When they hover over any Data USA chart or graph, it reconstitutes the graphic to show more visualized details of the data. Once users reach the level of specificity they need, they can embed or download the presented charts as well as download data into various formats for further manipulations or to make comparisons. The download format options are PDF, SVG, and CSV. Graphics can be embedded into existing documents, and the data URL can be shared via Twitter or Facebook. The data has an API, and the project is open source. As one early reviewer notes, “It feels like a statistical atlas of the United States, with modern functionality.”
MIT’s Macro Connections Group
Cesar Hidalgo, director of MIT Media Lab’s Macro Connections group and a founder of Datawheel, has been one of the key movers in the process of creating Data USA. Hidalgo’s passion is turning data into stories. Fast Company explains, “Hidalgo knows there are enough stories buried in the U.S. census data alone to keep him busy for the rest of his life.” Hidalgo’s goal was to create a self-sustaining platform that allows other developers to build on it and integrate it with their own datasets.
“The US government offers almost 200,000 data sets for public use, often out of reach for the average citizen,” Hidalgo says. “Data USA transforms these datasets into stories, pioneering a new breed of user-friendly government data sites that we urgently need.” Stories are far more interesting, compelling, and meaningful than raw data.
“But making U.S. census data beautiful was only part of what Hidalgo set out to do with Data USA,” notes Fast Company. “The other part was pulling government data out of the deep web—that dimly lit basement of the Internet that isn’t searchable by Google’s web crawlers.” Doing so allows Google and other search engines to index the data—an opportunity usually lost to traditional web search engines.
From Data to Analysis to Stories
Data USA is an important link between all of the data and information being released through the government, nongovernmental organizations, and other sources and users who don’t necessarily have the experience, background, or ability to manipulate the many data sources available. In creating an open source environment, with graphics as well as datasets, anyone can easily find in-depth analyses of issues that are hyperlocal, local, national, or global. With the ability to save and download data (either raw or as refined through exploring the website), it is possible to further refine, reuse, or add other information to the results.
Data USA’s website is a state-of-the-art, comprehensive toolkit and presentation development system; however, if you need to go beyond U.S. government data, you still have an uphill struggle. Datasets are still difficult to access, and Googling is of little help. If you know what you are looking for or the source of the data, you may be able to discover it. However, in most cases, accessing that data still involves working with huge ZIP files.
Bernard Marr writes in Forbes that “there are hundreds (if not thousands) of free data sets available, ready to be used and analyzed by anyone willing to look for them.” Although he lists what he believes are the 33 best available sources today, his list is far from comprehensive.
Journalists and researchers across the globe have been dealing with the increasing number of open datasets and how to better manipulate, merge, represent, and analyze them. New tools are available, and, with Data USA, we are beginning to see the application of sophisticated analytical and presentation systems become accessible to anyone via the web. Clearly, this is a milestone that can only result in greater insights and analysis here in the U.S., and, hopefully, in the future, throughout the entire world of data as well.