Investigate Variables with Bar Charts and Histograms
A basic part of the data-understanding phase of the data-mining process is investigating variables one at a time, reviewing their distributions, and checking for obvious data quality issues. Bar charts and histograms are visual summaries that make it easy and quick to understand variable distributions.
The two chart types are very similar. If the variable is categorical, use a bar chart; it will have one bar for each category, and the height of the bar shows the frequency of each category. If the variable is continuous, use a histogram. In the histogram, each bar represents a range of values for the variable.
Your data-mining application may make it very easy to get these charts. They are often included in the output of general-purpose data summary tools.
But it isn’t always simple to get the chart you want. Look closely, and you’ll see the phrase Open chart beneath the bar chart. Clicking this link opens a chart editor. You’d expect to see a chart that’s identical to the one in the data summary open in the editor, right? The following figure shows the chart editor as it looks when opened this way.
Not identical! You’ll have to fuss with setup to get back to the same point.
But this chart editor offers value in other ways. It gives you more options, such as creating more sophisticated chart structure (The next figure shows an editor that allows complex graph structure) or controlling cosmetic elements like color. Charts editors also provide pathways to export graphs to use in your reports or presentations.
The complexities of chart setup seen in this section are matters of product design. A data-mining application may make some operations very easy and others remarkably complex, or not possible. No one magic product outshines all others for ease of use, but one may fit your work style better than others. So, before you settle on a product to use, give it a thorough tryout for the kind of work that you need to do.