Data Mining For Dummies
Book image
Explore Book Buy On Amazon

If you’re looking for data that the federal government might have, but you aren’t sure which agency is involved, start your search on the federal data portal. There you will find a searchable catalog of data from all federal agencies. You can search for datasets by keywords and get information about what’s available, the source for each dataset, the formats available, and where to find the data.

The data portal isn’t a source for data, just information about what data is available and where to get it. And the portal doesn’t cover every bit of government data available. So, if you find something that’s useful to you on Data.gov, follow up by investigating the website of the agency that actually provides that data to search for additional information and data.

If you need something you can’t find, contact the agency directly. You may be able to speak with someone who can help you locate what you need, or at least find out why the data you want is unavailable.

While nothing is new about public data, the portal facilitates certain new initiatives. All newly generated federal government data is required to be made publicly available in open, machine-readable formats, while maintaining privacy and security. The key concept here is machine readability, providing data in formats that are appropriate for computing use, especially use in developing applications.

Agencies are also required to

  • Create a single agency data inventory: They must document and track data assets as they do equipment, furniture, and other assets.

  • Publish a public data listing: The listing must be posted on the agency’s web pages, including all data assets that are public or that could be made public.

  • Develop new public feedback mechanisms: They must provide ways for the public to provide feedback related to data-sharing priorities.

The federal data portal also allows local governments to add their datasets to the portal’s catalog. This is not mandatory and not many cities are ready to participate, but you may come across some local data in the catalog, and you can expect to see more in the future.

While this portal can lead you to a large and diverse range of data, none of it was created specifically for data-mining use. All of it was originally collected for government use; sharing with the public is secondary.

Privacy and security requirements prevent some data from being made public, and some data can only be shared in aggregate form. (For example, an individual’s income may be private, while the average income of a group of people is public.) And open data initiatives are driven by programmers, not data miners, so the data may not be organized or formatted as you prefer.

The data portal is a starting point, not a final destination, in your search for data. Not all government datasets are included in the catalog, and some that are may not be tagged with the keywords that you choose for your search. But Data.gov can guide you to many useful datasets and provide leads to agencies that may have more to offer. You may even discover some unexpected gems to enhance your data-mining work.

About This Article

This article is from the book:

About the book author:

Meta S. Brown helps organizations use practical data analysis to solve everyday business problems. A hands-on data miner who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics.

This article can be found in the category: