Information Today, Inc. Corporate Site KMWorld CRM Media Streaming Media Faulkner Speech Technology Unisphere/DBTA
Other ITI Websites
American Library Directory Boardwalk Empire Database Trends and Applications DestinationCRM Faulkner Information Services Fulltext Sources Online InfoToday Europe KMWorld Literary Market Place Plexus Publishing Smart Customer Service Speech Technology Streaming Media Streaming Media Europe Streaming Media Producer Unisphere Research

News & Events > NewsBreaks
Back Index Forward
Threads bluesky LinkedIn FaceBook Instagram RSS Feed

Google Burrows into State Government Data
Posted On May 7, 2007
Public sector data has a lot of things going for it. It’s authoritative, often unique, sometimes comprehensive, and open to many uses as public domain content prepaid by the taxpayer. However, accessibility has never been one of its strengths. Content often lies embedded in legacy systems invisible to Web search engines or gets swept to the bottom of search results by better SEO’d, though less informative, Web sites. Google has begun a move to change that with a new effort using the free Sitemap protocol and the free Google Custom Search Engine. The governors’ offices in four states—Arizona, California, Utah, and Virginia—have announced partnerships with Google, instructing state agencies to begin opening up previously hard-to-find public information. Already some concern over privacy issues has been raised as state public records go on the open Web.

According to J. L. (John Lewis) Needham, manager of public sector content partnerships at Google, as many as four out of five visitors to government Web sites reach those sites from a Web search engine search. Google wants to connect "citizens with their government by offering the public better access to public sector information and services," said Google’s chairman and CEO, Eric Schmidt. Google has expanded its outreach to public sector sources in general. Needham said it now has a "full time team" working "public sector entities on many fronts," including states, the federal government, and localities.

The latest effort has Google working with chief information officers or chief technology officers in the states under a program endorsed by governors’ offices. Needham described this approach as "the model we do prefer to work with. It puts understanding of the issues and their urgency at the highest level. Every day thousands or millions of Google users are looking for and not finding state information. They are phoning agencies or driving down to a state office and standing in line when they don’t have to. It’s most important that the executive office understands how critical it is to success to reach out and then the CIOs or CTOs get the state agencies to comply."

Partnering States

Google is working with the states and supplying tools such as Sitemap and the Custom Search Engine at no charge. The effort to reach out to states began last year. (To see the talking points Google uses to promote the initiative with the public sector in general, read "Making your agency’s sites more accessible to search engine users: Implementing the Sitemap protocol" at For a presentation tailored to an individual state, check out the California adaptation at Once the four states got on board, the chief information officers began scheduling specific agencies in each state for opening up their content to Google using the open source Sitemap protocol ( (For a list of agencies from each state already contributing data to the new Google initiative, see the appendix at the end of the NewsBreak.)

According to Clark Kelso, chief information officer for the State of California, they chose to initially involve "some of our most data-rich departments, like Education, Health Services, Coastal Commission, Employee Development, Consumer Affairs, General Services, and the State Bar. With just these departments engaged, we’ve made visible to the search engines around 100,000 new pages (i.e., we had 100,000 existing Web pages that previously were NOT visible to the search engines—this initiative is about making our existing data more accessible, not adding new data)." The list of original agencies chosen is now expanding on its own. Kelso commented, "I know that many more departments are now engaged in this activity (in part because of all of the publicity), so the list of departments and numbers of newly visible pages will continue to grow."

Besides opening up eight databases using the Sitemap protocol, Arizona has used the Google Custom Search Engine to build the All Arizona Government Search for the State’s Web Portal ( Arizona’s Government Information Technology Agency (GITA; is working with more agencies on implementing the Sitemap and searching for additional opportunities to expand the use of the Custom Search Engine. Needham had nothing but compliments for Arizona’s efforts. "Arizona opened up eight databases in 2 months. That’s a huge accomplishment for a state government with many levels of decision making. There are a lot of smart people there. It took less than 50 hours to do the sitemapping."

Utah has added Google Sitemap instructions to its basic Web standards for all state agencies. It has also employed the Custom Search Engine on the site, bringing site access for all sectors of government relevant to Utah, whether federal, state, or local.

Virginia has already added almost 80,000 URLs from 27 domains under the initiative. State officials expect the number to increase as more agencies join the project. The state already had set a high priority for "transparent government." According to Governor Timothy Kaine, "Our goal is simple, intuitive, and quick citizen access to every government resource. Our partnership with Google is one example of our many citizen-focused initiatives to simplify government and provide greater access to its services."

How It Works

As explained in a Google Information for Public Sector Organizations page (, the difficulty with reaching all information available through state government Web sites lies in the trouble crawlers have reaching inside database applications. If extracting results from a data collection on a Web site requires that the user performs a search, then the results returned usually will have a dynamic URL that incorporates the search terms. With no stable, static URL in place for the data, Google’s spiders cannot reach the content. The new initiative with state governments offers two tools to help solve access problems. The Sitemap protocol opens up previously unseen data to Google and its users, while the Google Custom Search Engine brings data already on Google to the attention of state government Web site users.

The new, open source Sitemap protocol was developed by Google in June 2005 and available under a Creative Commons license. It was approved as an industry standard by Microsoft and Yahoo! in November 2006. It is already in use by The CIO Council of federal government agency information executives has given the protocol strong support. (For excellent coverage of the history of the Sitemap protocol and its use in federal agencies, check out the FederalSitemaps wiki at

The Sitemap protocol uses XML to create lists of URLs to produce a comprehensive list of pages on a Web site for the crawlers. Needham described the Sitemap protocol as "an elegant solution" to the problems of accessing more data. "It requires no adjustment to the existing Web site, works regardless of the age of the content or the system, and follows the same pattern as a human searcher." Once created, state Webmasters submit the sitemaps to Google. Google’s Webmaster Tools ( and Google Webmaster Central also supply a Sitemap Generator to help automate the Sitemapping process, along with links to several third-party Sitemapping tools. One Sitemap can hold up to 50,000 URLs, and each Sitemap Index file can hold up to 1,000 Sitemaps. As databases change, Webmasters must update and resubmit Sitemaps to ensure accurate retrieval.

Based on the Google Co-op program introduced a year ago (see the NewsBreak "Pick of the Litter: Google Co-op" at, the Custom Search Engine lets agencies create specialized indexes of information out of Google’s main index for searching from their own Web sites. Arizona, Utah, and Virginia have all implemented Custom Search Engine tools, which can incorporate search results from their own Web sites and those of other government agencies, nonprofit organizations, educational institutions, or even private sector concerns.

Every Silver Lining Has a Cloud

It seems that whenever established, traditional databases suddenly add dramatic new accessibility functionality, negative incidents soon follow. Perhaps it’s the Law of Unintended Consequences—more often the "white glove" test for existing policies and procedures. Privacy advocates have already begun watching for breaches. Since much, if not most, of the data coming into Google from the initiative will involve public records, we can expect to see some individuals finding themselves "Naked in Cyberspace," as the book title ran. Google is already putting the responsibility on the state agencies. "It’s already public data. Whatever the state agencies have chosen to make public, anybody could find," said Needham. "From a broader perspective, it could help reduce privacy abuse. It will make it more likely that citizens uncover issues. Our state partners are taking steps to double check their data." For example, in California, CIO Clark Kelso has reportedly directed all state agencies to remove confidential information such as Social Security numbers from any documents made available online.

However, future developments in using this data will probably include connecting to Google Maps and Google Earth using mashup tools. Why should this country or this planet be any different? Google has already partnered with a federal government agency—NASA Ames Research Center—to produce 3-D maps of the moon and Mars. Incidentally, Google may not be the only beneficiary of the new initiative. Once the state agencies create the sitemaps, they can submit them to other search engines, such as Yahoo!, Microsoft, and According to California’s Kelso, "The Sitemap protocol is open source and we certainly are interested in the broadest utilization of this information by commercial search engines."

Overall, for states and their citizens, Needham commented on the initiative: "This is tremendous. If you consider the impact, no IT project in a state could have as much impact." However, the online vendors marketing public record data—e.g., DataQuick Information Systems, Merlin Information Services, LexisNexis’ public records collection—may not be as happy. These vendors are accustomed to buying or licensing state data for generally low fees, processing it, and selling online access at a high price. The season of those halcyon days may be ending. Does that pattern sound familiar? Sooner or later, it looks like Google will get around to everyone.


Google State Government Initiative: State Agencies at Work

While Google does not publish a list of all the state agencies and the data sources they are contributing, all the content from the states will join the federal and other government content in Google U.S. Government Search (

State Jobs Database (ADOA)
Arizona 2-1-1 Online Databases (AHCCCS)
Real Estate Database (realtors, companies, instructors; Department of Real Estate)
Licensed Contractors (Registrar of Contractors)
Licensed Child Care Facilities (DHS)
Licensed Nursing Homes (DHS)
Executive Orders, Press Materials, Public Schedule, etc. (Governor’s Office)
Sex Offender Infocenter (DPS)

A pilot application of the Google Custom Search Engine taps into state, local, and tribal government agencies throughout Arizona. The All Arizona Government Search is available at Arizona State’s Web Portal (

Department of Education
Department of Health Services
School for the Deaf Riverside
Radiologic Health Branch
Coastal Commission
Department of Boating and Waterways
Employee Development Department
Office of HIPAA Implementation
Department of Consumer Affairs
Department of General Services
Telecommunications Division
Procurement Division
The State Bar of California
South County Advisory Council

This list is current as of late April, but many more agencies, according to Clark Kelso, chief information officer for the State of California, are now joining.

Department of Workforce Services
State Tax Commission
Department of Natural Resources
State Library

Office of the Governor
Office of the Lieutenant Governor
Information Technologies Agency
Governor’s Office for Substance Abuse Prevention
Auditor of Public Accounts
Council on Virginia’s Future
eVa (the state’s electronic procurement system)
The Library of Virginia
Assistive Technology System
Board for People with Disabilities
Department for the Blind and Vision Impaired
Department of Alcoholic Beverage Control
Department of Game and Inland Fisheries
Department of Health Professions
Department of Emergency Management
Department of General Services
Department of Environmental Quality
Department of Mental Health, Mental Retardation and Substance Abuse Services
Department of Accounts
Department of Forestry
Department of Social Services
Department of Labor and Industry
Department of Transportation
Department of Housing and Community Development

Using the Google Custom Search Engine, lets visitors search for information from all sectors of government, whether provided by a federal, state, or local government source.

Barbara Quint was senior editor of Online Searcher, co-editor of The Information Advisor’s Guide to Internet Research, and a columnist for Information Today.

Comments Add A Comment
Posted By Carole Lane5/7/2007 10:05:21 PM

It's always strange to me that the web and online vendors are demonized when they make it easier to access public records. What gets lost in the argument is that the records BELONG to the public, and the vendors (and by extension the web) are providing a valuable service. If we did not have access to public records, we could do little to safeguard the masses from unscrupulous politicians. Much of our commerce would also screech to a crawl (i.e. real estate transactions, running background checks for pre-employment, etc.).
If someone has a problem in the fact that the records are public, they should be debating or proposing legislation addressing THAT. It really should not be about the access method; taking them offline will not make them private, just more expensive in time and/or money to access.

              Back to top