Data Mining For Dummies
Book image
Explore Book Buy On Amazon

Before you begin searching for data to mine on data.gov, the federal data portal, you must understand one thing: There is no data on the site. Data.gov is home to a data catalog, a list of dataset names with details such as descriptions, formats, and urls for obtaining data and additional information. The data itself is hosted and shared by the individual government agencies that create it, and each agency does things in its own way.

Although a lot of data (more than 100,000 datasets) is cataloged on data.gov, it still covers only a fraction of what's available from government agencies. Agencies are required to maintain some new data in electronic form and list it on data.gov, but they may have additional resources that are not listed there. So data.gov is a good starting point in your search for government data, but it's not a comprehensive source.

Here's how to begin:

  1. Go to the data.gov home page, and in the box that says "Get Started," enter keywords for the type of data you need.

    image0.jpg

    You'll get a list of datasets whose descriptions mention your keywords. The list may include thousands of results.

    image1.jpg

    On the left side of the screen, you will see options for narrowing your search based on tags, data formats, the agency that produced the data, and other factors. There is even an interactive map that lets you indicate the geographic area you have in mind.

    image2.jpg
  2. Narrow your search to get you a shorter, more relevant list of datasets.

    image3.jpg
  3. When you find a dataset description looks appropriate for your needs, click on the name.

    image4.jpg

    You'll get a more detailed description of the dataset. In some cases, this information will include the location of a data dictionary, documentation which explains the data fields, the email address of a contact person or other information you may need.

    image5.jpg

    Note that download buttons don't always take you straight to the data. Often, these link to another web page, on data.gov or an agency site. You may find yourself navigating a number of pages on an agency site before actually reaching the data itself.

About This Article

This article is from the book:

About the book author:

Meta S. Brown helps organizations use practical data analysis to solve everyday business problems. A hands-on data miner who has tackled projects with up to $900 million at stake, she is a recognized expert in cutting-edge business analytics.

This article can be found in the category: