How to Summarize and Graph Categorical Data

By John Pezzullo

A categorical variable is summarized in a fairly straightforward way. You just tally the number of subjects in each category and express this number as a count — and perhaps also as a percentage of the total number of subjects in all categories combined. So, for example, a sample of 422 subjects can be summarized by race.

Study Subjects Categorized by Race
Race Count Percent of Total
White 128 30.3%
Black 141 33.4%
Asian 70 16.6%
Other 83 19.7%
Total 422 100%

The joint distribution of subjects between two categorical variables (such as Race by Gender), is summarized by a cross-tabulation (“cross-tab.

Cross-Tabulation of Subjects by     Two
Categorical Variables
White Black Asian Other Total
Male 60 60 34 42 196
Female 68 81 36 41 226
Total 128 141 70 83 422

A cross-tab can get very cluttered if you try to include percentages. And there are three different kinds of percentage for each count in a cross-tab. For example, the 60 white males comprise 46.9 percent of all white subjects, 30.6 percent of all males, and 14.2 percent of all subjects.

Categorical data is usually displayed graphically as frequency bar charts and as pie charts:

  • Frequency bar charts: Displaying the spread of subjects across the different categories of a variable is most easily done by a bar chart. To create a bar chart manually from a tally of subjects in each category, you draw a graph containing one vertical bar for each category, making the height proportional to the number of subjects in that category.

    But almost all statistical programs will prepare bar charts for you; you simply select the options you want, such as which categorical variable you want to display and whether you want the vertical axis to show counts or percent of total.

  • Pie charts: Pie charts indicate the relative number of subjects in each category by the angle of a circular wedge (a piece of the pie). To create a pie chart manually, you multiply the percent of subjects in each category by 360 (the number of degrees of arc in a full circle), and then divide by 100. That will give you the angle of each wedge of the pie.

    You draw a circle with a compass and then split it up into wedges using a protractor (remember those drawing tools from high school?). Much better to have the computer make a pie chart for you — it’s no more difficult than having a program make a bar chart.

    But comparing the relative magnitude of the different sections of a pie chart is more difficult than comparing bar heights. Can you tell at a glance, from the pie chart shown below,whether there are more whites or blacks? Or more Asians than “others”? You can make those distinctions immediately from the bar chart.

    Pie charts are often used to present data to the public (perhaps because the “piece of the pie” metaphor is so intuitive), but they’re frowned upon in technical publications.

    image0.jpg

    Many programs (including Excel) let you generate so-called “3D” charts. However, these charts are often drawn with a slanting perspective that renders them almost impossible to interpret quantitatively, so avoid 3D charts when presenting your data.