Government Data an Enigma No More (Beta)
Barbie E. Keiser
Posted On March 11, 2013
Enigma Technologies, Inc. is a new search and discovery platform (enigma.io) designed to allow subscribers to delve deeply into (public) data from municipal, state, federal, and even international organizations. This data is scattered in many locations, on the web and off, available in (fielded) databases and semistructured format. (Apparently, .io, ISO-3166 code for the British Indian Ocean Territory, is a popular domain among Big Data startups.) Currently in beta, Enigma.io’s official launch is anticipated in early spring.
According to co-founder Marc DaCosta, Enigma.io seeks to collect and normalize the data at each level of government, from every locale, executable in one web search user interface. “The goal is to cut what might be a 40-hour research project to under one hour.”
Enigma.io is targeting verticals of data across the nation today, and around the world tomorrow, such as:
- Secretary of State business registrations
- UCC lien filings
- Real estate ownership (top 50 metro areas)
- “Obvious ‘go to’ federal agencies”
- “2d and 3rd tier regulatory authorities” (e.g., Federal Energy Regulatory Commission—FERC)
Today, beta testers, including students and faculty at the Harvard Business School, can access import bills of lading, aircraft ownership, lobbying activity, real estate assessments, spectrum licenses, financial filings, liens, or government spending/contracts. Users have noticed that the filters are not intuitive as they might be, though this could be due to the data sets being queried at the time. Others have mentioned that the platform displays best using specific browsers, but the system does not recognize this and prompt the user to switch browsers. These are minor adjustments that are sure to be addressed during this beta phase.
Inquiring Minds Want to Know
Impetus for the tech startup was the 2008 Presidential election’s intense press speculation of Sen. John McCain’s choice for vice president. Who “guessed” correctly? The journalist who matched the name of a potential candidate with companies possessing airplanes and noticed an unusual number of plans filed for flights that were Wasilla-bound.
In 2010, Enigma.io co-founders DaCosta and Hicham Oudghiri set out to build a platform that would allow subscribers to query across data sets from a single interface, filtering the data as they go. What if more data were searchable to answer such questions as which New York City departments are purchasing office supplies from Staples and which from OfficeMax (and how much)?
The Enigma.io team—seven engineers and five staff members focused on data sourcing and editorial work—discovers the data it incorporates into the product by crawling .gov websites to identify interesting data sets, downloading or scraping government websites, as well as executing licensing arrangements with entities to deliver data directly. The public data is filtered using sophisticated natural language processing (NLP) technologies, relational engine, and proprietary data ontology, allowing the user to view the data in the context of a particular research effort.
Enigma.io founders have not gone after data sets in a strategic manner. Initial data gathering efforts were driven by the interests of the firm’s early partners, (i.e., the venture capitalists who supplied $1.1 million in the first round of funding-April 2012). The investors are Brent Hurley (YouTube), TriplePoint Capital, Crosslink Capital, Strauss Zelnick (Take Two), and Matthew Glass (Colbeck Capital).
The plan appears to be to let subscribers suggest the direction in which the company proceeds (i.e., the data sets users would like to see added to the system). When subscribers require more current data, Enigma.io will harvest the data set. DaCosta describes Enigma.io’s initial targets as news media, academia, and “the largest of organizations” (e.g., government contractors).
How Does it Work?
To date, Enigma.io has taken more than 100,000 public data sets from government agencies and restructured the data in a way to make it searchable. The goal is to be exhaustive, collecting as much data from government entities as possible, based on utility to subscribers.
Searching for comparable data sets across government agencies and localities is time-consuming. First, researchers have to know where to turn for the data needed. Care must be taken when pulling data from multiple sources and sites to be sure that comparisons are valid. Enigma.io makes sense for relatively novice researchers, who don’t know where to turn for the data they need, though this is hardly the typical Big Data researcher.
For those unfamiliar with Excel, the system’s bulk download capabilities make it easier to download an entire data set than otherwise possible. Subscribers can export any data set (Excel, CSV, or JSON). Data is presented in elegant, easy-to-interpret dashboard format, though the addition of visualization tools could make the platform even more useful.
This subscription-based service sees LexisNexis, Bloomberg, and Reuters as its major competition. While the specific pricing levels are not yet set, the model is clear. Different prices are to be negotiated within three distinct categories:
- For individual researchers and analysts (single user)
- For teams and institutions (enterprise)
- For internal and external applications (API)
Data For the Social Good
Engima.io’s approach flies in the face of 21st Century open data portals, where government entities share raw data with the public, supported by tools developed by organizations such as Sunlight Foundation and Socrata. These entities ensure that quality data is always freely available to the public in up-to-date, usable formats. (Our Feb. 28, 2013 NewsBreak describing Sunlight Foundation’s latest tools can be found here. Socrata’s Open Data Field Guide, released last month, can be found here. For additional mashups and visualizations of open (government) data, go to https://opendata.socrata.com).
How Enigma.io flourishes will depend on the company’s ability to convince a sufficient number of organizations that the data it needs is available from Enigma.io, can be filtered by any user with limited IT proficiency, and used for a competitive advantage.