While the tech community has been buzzing about public data, open data, and Big Data for the past several years, the Association of Public Data Users (APDU), founded more than 40 years ago, has had a relatively low profile in discussions happening outside of its own community of statisticians and demographers. Connections made at APDU’s 2012 conference, held at George Washington University in Washington, D.C. on Sept. 12-13, could begin to change that. Speakers included representatives from the federal government (including the White House and leadership from the federal open data portal Data.gov), open data advocacy groups, and private sector data analysts and suppliers. Librarians, and especially data librarians, joined the statisticians in attendance. APDU 2012 concluded with a session on the Census Bureau’s American Community Survey (ACS), an endangered product dependent on survey data. Researchers and businesses have been concerned about the ongoing political battle over ACS since the House of Representatives voted in May to defund the program and to also make participation in the survey voluntary.
The future of ACS is a key concern, but APDU addressed a more fundamental challenge with the meeting theme, The Future of the Federal Statistical System in an Era of Open Government Data. Open data at the federal, state, and local government levels includes administrative data, such as the information that people or businesses supply to participate in a government program or fulfill regulatory requirements. It also includes operational data—data collected by governments in the course of their work—such as information on the federal prison population or on student loan disbursements. Big data, which can come from public or private sources, might also be seen as an alternative or supplement to survey data. “Big data,” as described by APDU speaker and former Census Director Robert Groves in his June 27 blog post, includes “massive data sets that are being produced daily through internet search, social media, and administrative data processing.”
Speakers on one panel added another descriptive term to the talks: “new data.” Roberto Rigobon of the Billion Prices Project and Mike Horrigan of the Bureau of Labor Statistics explained the possibilities of using big data in the process of calculating economic indexes—particularly price indexes—in their talk on “New Data in an Open Data World.” Price data scraped from retailer websites can be used as part of the calculation process to enhance the timeliness of the Consumer Price Index, for example.
At the conference, many attendees and speakers were still cautious about using this new data. APDU members rely on trusted survey data: sampled data collected specifically for research or policy purposes and evaluated for quality using accepted statistical practices. They work with the Federal Statistical Community, which develops and refines formal Statistical Policy and Guidelines under the authority of the Office of Management and Budget’s Office of Information and Regulatory Affairs.
New data is getting their attention in part because government survey data programs now face challenges on a number of fronts, addressed in the first day’s session “An Evolving Federal Statistical System: Preparing For the Future by Learning from the Past.” The session panel identified critical issues: the rising cost of collecting survey data; a need for timeliness not met by large survey programs; statutory and departmental silos hampering innovation; and worsening survey response rates. Lower response rates are not unique to government surveys, but declining trust in government has added another dimension.
Resources such as administrative data and big data are candidates for supplementing or sometimes replacing survey data, but statisticians are concerned about the quality and completeness of these new sources if they are to be used for research or policymaking. As with other conference sessions, slide decks for the “Evolving Federal Statistical System” session are linked from the APDU 2012 meeting agenda online. Slides from Robert Groves, who is now at Georgetown University, outline the conditions making the current system unsustainable and the inadequacies of new “organic” data sources such as internet transactions (An Evolving Federal Statistical System, PDF).
The threat to the American Community Survey illustrates the challenges faced by other government survey programs. The ACS is an ongoing statistical survey designed to replace the former decennial census “long form.” Businesses, researchers, and government agencies at all levels use the data.
For those just catching up with the story, ACS was an unexpected target of House floor debates in the 112th Congress. In May, the House approved amendments to the fiscal year 2013 appropriations bill covering the Census Bureau (the 2013 Commerce, Justice, Science, and Related Agencies Appropriations Act on THOMAS.gov), eliminating funding for the ACS and prohibiting enforcement of the fine for not participating in the mandatory survey—a measure, APDU speakers noted, that has never been used for ACS. The Senate has not taken up the bill yet, and it appears unlikely that an appropriations bill will be passed in this Congress. While the Senate is sympathetic to the need for ACS, a lobbyist representing the pro-ACS International Council of Shopping Centers at the meeting said the issue would probably be brought up again in the 113rd Congress.
Andrew Reamer, a research professor at the George Washington University’s Institute of Public Policy, has been compiling information in defense of ACS on his continuously updated web page called Resources Regarding the American Community Survey (ACS) of the U.S. Census Bureau and in the draft document American Community Survey: Uses and Users (PDF). Reamer traces the move against ACS to the “You Cut” community engagement site of the House leadership, where the program was suggested as a target for elimination. Members of Congress may have been reacting to constituent complaints about the “invasive” nature of the survey, which includes some potentially sensitive questions such as whether a person has difficulty bathing or dressing. Responses are confidential and data is not tied to a named or identifiable individual, but objections remain.
The fate of the ACS remains unsettled. The future of the federal statistical system could remain healthy with the type of outreach exhibited in the APDU 2012 program.