Bring on the public data

Two years ago today, Google announced an effort we'd initiated with a number of U.S. state governments, including California and Virginia, to make it easier for their citizens to locate information on state agencies' public websites through search engines. The problem we set out to solve is common to governments worldwide: The more information a government seeks to make available online in the form of public records, the more difficult it can be to house this information and ensure it's indexed by search engines--and thus found by the public for whom the records are intended. Since then, we've observed many governments taking up our proposed solution, providing search engine crawlers a sitemap. As a result, many millions of public records are now just one search away.

Earlier this week, we introduced an effort of a similar kind to make it easier for people to locate public data. Now, when our users search for the population of a U.S. state or the unemployment rate in a county in the U.S., we provide relevant statistics and graphs in the search results. Clicking these graphs opens a page that presents larger, interactive graphs that further illuminate the data and can be customized and shared. The data we present in this new search feature are drawn from datasets published by the recognized sources for such information, the U.S. Census Bureau's Population Division and the U.S. Bureau of Labor Statistics, respectively.

Our ambition with this new search feature is to enable our users to more easily find and use many more types of public data. How do we define public data? The first part is easy: By "public" we mean any data that an organization now offers, or would like to offer, for free public use, including in a commercial service like our public data search feature. Defining what we mean by "data" is more complicated, as we'd prefer not to limit the range of data that organizations could consider making available through this feature. However, in a contact form for organizations that want to tell us about their public data, we do offer some sense of scope and examples:

1. Statistical data: GDP per capita, average temperature, life expectancy

2. Raw data: building permits, reported car accidents, economic transactions

3. Reference data: registered vaccines, endangered species, classifications of types of crimes

4. Hierarchical data: diseases and their sub types like diabetes type 1 and type 2, primary economic sectors broken down into secondary sectors

5. Geographical data: political boundaries, rivers, ISO codes

We're now looking to identify data from recognized sources to consider adding to this feature. We're also seeking organizations that would like to help us determine how we can collectively develop the right data standards and technology for enabling broader access to and use of public data. If your organization has such data to share and you would like to work with us, tell us about your organization and your data. Of course we won't use any data that compromises the privacy of individuals or infringes upon any proprietary rights.

We look forward to hearing from government agencies, research institutes, non-profits and even private organizations. While we won't be able to individually reply to every party that contacts us, we may be in touch to learn more about your data.

Thursday, April 30, 2009 at 11:51 AM

No comments: