The White House made major information policy last week, issuing an Executive Order and an Office of Management and Budget (OMB) Memorandum that established “open and machine readable” as the new default for federal government information. While open government advocates continue to analyze the wording of the documents, the original information released on May 9, 2013 can be found on the White House website.
Making Open and Machine Readable the New Default for Government Information, Executive Order 13642
Open Data Policy—Managing Information as an Asset, OMB Memorandum M-13-13, issued jointly by OMB and the Office of Science and Technology Policy (OSTP)
Landmark Steps to Liberate Open Data, blog post and video from U.S. CTO Todd Park and U.S. CIO Steve VanRoekel
Obama Administration Releases Historic Open Data Rules to Enhance Government Efficiency and Fuel Economic Growth, White House press release
The Executive Order focuses on the potential economic benefits of open data, citing businesses that have developed around the government’s open distribution of weather and GPS data. President Barack Obama stressed the job-creating and innovation-fostering aspects of the policy by announcing it while visiting an Austin startup incubator to learn about Stormpulse, a company that relies on government data in its asset management and risk analysis services. For those seeking useful quotes to bolster their open access project justifications, the Executive Order provides this gem: “Making information resources easy to find, accessible, and usable can fuel entrepreneurship, innovation, and scientific discovery that improves Americans’ lives.”
Open government advocates are cautiously optimistic about the policy. Canadian open data activist David Eaves blogged that the announcement “is an important step, and one that other governments should be looking at closely. It is an effort to reposition government to better participate in and be relevant in a data driven and networked world, and it does foster a level of access around a class of information, data, that is too often kept hidden from citizens.”
Will all U.S. federal government information now be published in open, standard formats to be easily found, accessed, and re-used? No. The order applies only to the executive branch and does not affect congressional or judicial branch information. It provides protections (or loopholes, depending on your perspective) for data that relates to law enforcement, national security, personal and privileged information, and information protected from disclosure by current law. Observers also point out that the policy does not emphasize transparency of government operations, something we have come to associate with the phrase “open data.” The Cato Foundation’s Jim Harper blogged on this point: “Government transparency is not produced by making interesting data sets available. It’s produced by publishing data about the government’s deliberations, management, and results.”
The OMB Memorandum outlines the requirements for federal agencies, including independent agencies, to comply with the Executive Order. Earlier administration instructions—the 2009 Open Government Directive—met with resistance from some agency leaders, who saw it as a distraction and burden on limited resources. Supporters of open government data will be watching for signs of similar agency foot-dragging this time around.
OMB’s overarching requirement is for agencies to “collect or create information in a way that supports downstream information processing and dissemination activities.” Specifically, “this includes using machine-readable and open formats, data standards, and common core and extensible metadata for all new information creation and collection efforts.” (In addition to new information collections, the requirements apply to “major modernization projects that update or re-design existing information systems.”) The common core metadata standards mentioned in the policy are from a Data.gov working group that drew on existing public schemas, including Dublin Core, for their work.
In any discussion with the word “open” preceding a word such as “data,” or “access,” or “information,” further definition of the basic terms is a must. The OMB policy provides its own definitions, drawing heavily on the language of other administration documents. For the policy, OMB equates data with “structured information”; this may include text and images if the content can be “converted to a structured format and treated as data.”
“Open” data, according to the policy, is:
- Public: There are exceptions for “valid restrictions” such as security.
- Accessible: Formats are suitable for “the widest range of users for the widest range of purposes.”
- Described: This includes metadata and dataset documentation.
- Reusable: The OMB policy uses the wording “available under an open license.” (Open data experts have noted that this implies the data is being released from possible licensing; a statement that the data is in the U.S. public domain from the start would be preferable.)
- Complete: Data should be provided at the same granular level as it is collected. (Agencies should also provide any structured data products derived from the complete, primary dataset.)
- Timely: It should be available “as quickly as necessary to preserve the value of the data.”
- Managed Post-Release: This means that a point of contact is provided. (Leslie Johnston, acting director of the National Digital Information Infrastructure and Preservation Program at the Library of Congress, recently blogged that guidance on dataset preservation should also be considered.)
The new policy recognizes that government data is not necessarily public if the public cannot find it. OMB requires that public data collections be listed at the agency website. Opening up even further, OMB says the list should include “datasets that can be made publicly available but have not yet been released.” In his initial blog post on the new policy, Sunlight Foundation policy director John Wonderlich singles out this specific requirement for praise and adds: “To be sure, getting agencies to publicly list all their data that can be open will be a significant challenge, even with a high-profile Executive Order. Concerns like cost, privacy, and security will be used to justify non-disclosure (as they often are), and will be used to try to justify keeping even a description of many datasets private. That’s a good struggle to have, though, and one we’re looking forward to.”
Agency Guidance: A Revolutionary Approach
One of the most interesting products of the new policy is Project Open Data. With this innovation, OMB and OSTP have recognized that a small, select group will not always have the best answers and that recommendations related to technology must evolve along with the ever-changing tech world. The project is located on GitHub, a popular open source software-sharing service that also may be used for texts. Project Open Data presents software tools, resources, best practices, and case studies to help agencies implement the new requirements. Anyone with a free GitHub account can suggest changes to the Project Open Data documents. Agency developers and members of the public are already submitting revisions and copy corrections; see the FAQ for details.
What about Data.gov?
The new policy retains Data.gov as the central listing of open U.S. federal government data. The “data” listings at individual agencies are intended to feed into Data.gov easily, and the new policy should improve its offerings. Meanwhile, Data.gov has been busy making changes of its own as detailed in its blog post Under the Hood of the Open Data Engine.
No doubt you will see improvements at Data.gov before you begin to see the fruits of the administration’s new open data policy. The policy, detailed guidance, and development of Project Open Data lay the groundwork for better data availability and access; the rest is up to the agencies. The first sentence of the Executive Order clearly communicates the significance of these changes: “Data is a valuable national resource and a strategic asset to the U.S. Government, its partners, and the public.” Government information should no longer be considered a limited byproduct of agency business.