Open and transparent is the theme surrounding our administration to the tune of one federally-run one-stop-resource of all data involving the management and governing of our nation: Data.gov. (Well maybe not ALL data.) In an effort to right past wrongs, the Obama Administration is seeking to make the government’s information available to the public in an organized, easy-to-consume package. The web development community been sold on a beautiful idea. Five months later, we can see how the data is stacking up.
Data.gov has been available since the end of May this year and it isn’t quite clear how well this has caught on within the mashup communities. The Tetherless World Constellation project at Rensselaer Polytechnic Institute has been following data.gov and notes 942 datasets and mentions that not all of the datasets are truly unique. Certain datasets even overlap with others. It wouldn’t be uncommon to see “2005 Toxics Release Inventory Data” for each state as a separate dataset then another dataset which aggregates all of the separate states into one dataset on its own. And instead of including all of the related data into one set, they are further divided by year so we are provided the same “Toxics Release Inventory Data” for 2006. So how many sets of data are we really looking at once everything is efficiently compiled? I could guess the pickings would be slim. When all is said and done, you’re left with a LOT of information about very, very narrow subjects which causes the applications for this data to dwindle to a few focused niches.
A slightly more important thing to note is the delivery method. What Data.gov really provides are links to the various federal agencies which serve this data in downloadable bites which were prepackaged and approved for public consumption. This means that a majority of this data is static and needs human intervention to be updated and made current. For those crunching data specific to those periods, this approach is perfectly fine, but this packaging is detrimental for anyone who intends for their applications to provide relevant and current data. This offering might be a little bit easier to swallow had the government provided an API to “live” data from its various agencies. I can understand the financial commitment is tough to make when talking about a project without obvious returns, but the long reaching advantages of providing a unique resource point for a specific type of data which is constantly updated as new information is available sounds too good to ignore. Specifically, being able to build your app once and not worry ever again about adding “next year’s dataset”, being able to monitor usage statistics of the data, providing a unified API for your agencies to publish their data and removing their cost of building custom transport and presentation services.
Bottom line, there are a few reasons why you might not find many mashups sporting a byline mentioning Data.gov as the source, but overall this is a step in the right direction and it would be unreasonable to expect it to be perfect from the beginning (especially coming from the government). While the data is not necessarily useful to the majority, something is better than nothing. Plus, the resource nudges state and local governments to post their data as well. The adoption so far is comparatively meager (likely due to, ahem, local government having to build a front-end for the data or bureaucratic red tape) though we’ve found progressive movement from California, Utah, Washington D.C., San Francisco, and New York City to name a few and I’m certain that’s not all.
While the delivery problem remains an issue at Data.gov, it is being recognized elsewhere and some have picked up the torch and are running with it. One idea run by the Open Planning Project called Open311 attempts to define an open standard with which all cities internationally will be able to interact with each other’s 311-type services and provide citizens with a platform to start an open dialog with its government. Not only does a system like this provide an open way to gather information but allows for feedback and easier access to municipal services and the framework is open which eases the development of the user interface for the city. In larger cities like New York with many more agencies and services than most cities, the government organizations would be able to pass data between each other more efficiently and save tax-payers time and money. At the risk of scaring the paranoid, consider the possible improvements of communication between state and federal governments.
Organizing the information in a government where the paper trails have paper trails is no small feat and I don’t envy the guy who has to tackle that bear! At this point, I’d say Data.gov has a ways to go before we start seeing some seriously useful app-work being done by the community. The proof is in the pudding. Glancing through recent submissions to an app contest featuring Data.gov sources can be a little underwhelming save for a gem or two. The information that’s there is much easier to find now which is great, but we need to see more federal organizations getting on board. Where is the NOAA who is sure to have stock piles of interesting data organized in neat rows of figurative buckets? And then let’s set this up as a hosted data store with an API: a live, evolving and improving source of information the developer can leverage without needing to merge and manipulate datasets to take advantage of it. Let’s give them more time. The potential is certainly there.