How to Break Down Categorical Statistics Using Two-Way Tables
You can break categorical data down using two-way tables (also known as contingency tables, cross-tabulations or crosstabs) to summarize statistical information about different groups. Categorical data (also known as qualitative data) capture qualities or characteristics about an individual, such as a person’s eye color, gender, political party, or opinion on some issue (typically using categories such as Agree, Disagree, or No opinion, or some variation of these).
Categorical data tend to fall into groups or categories pretty naturally. Political party, for example, typically has four groups in the United States: Democrat, Republican, Independent, and Other. Categorical data often come from survey data, but they can also be collected in experiments. For example, in an experimental test of a new medical treatment, researchers may use three categories to assess the outcome of the experiment: Did the patient get better, worse, or stay the same while undergoing the treatment?
Categorical data are often summarized by reporting the percentage of individuals falling into each category. For example, pollsters may report political affiliation statistics by giving the percentage of Republicans, Democrats, Independents, and Others. To calculate the percentage of individuals in a certain category, find the number of individuals in that category, divide by the total number of people in the study, and then multiply by 100%. For example, if a survey of 2,000 teenagers included 1,200 females and 800 males, the resulting percentages would be (1,200 ÷ 2,000) ∗ 100% = 60% female and (800 ÷ 2,000) ∗ 100% = 40% male.
You can break down categorical data further by creating something called two-way tables. Two-way tables are tables with rows and columns. They summarize the information from two categorical variables at once, such as gender and political party, so you can see (or easily calculate) the percentage of individuals in each combination of categories and use them to make comparisons between groups.
The U.S. government calculates and summarizes loads of categorical data using crosstabs. Typical age and gender data, reported by the U.S. Census Bureau for a survey conducted in 2009, are shown in this table. (Normally age would be considered a numerical variable, but the way the U.S. government reports it, age is broken down into categories, making it a categorical variable.)
You can examine many different facets of the U.S. population by looking at and working with different numbers from the table. For example, looking at gender, you notice that women slightly outnumber men — the population in 2009 was 50.67% female (divide total number of females by total population size and multiply by 100%) and 49.33% male (divide total number of males by total population size and multiply by 100%). You can also look at age: The percentage of the entire population that is under 5 years old was 6.94% (divide the total number under age 5 by the total population size and multiply by 100%). The largest group belongs to the 45–49 year olds, who made up 7.44% of the population.
Next, you can explore a possible relationship between gender and age by comparing various parts of the table. You can compare, for example, the percentage of females to males in the 80-and-over age group. Because these data are reported in 5-year increments, you have to do a little math in order to get your answer, though. The percentage of the population that’s female and aged 80 and above (looking at column 7 of the table) is 2.27% + 1.54% + 0.69% + 0.21% + 0.04% = 4.75%. The percentage of males aged 80 and over (looking at column 5 of the table) is 1.52% + 0.84% + 0.28% + 0.05% + 0.01% = 2.70%. This shows that the 80-and-over age group for the females is about 76% larger than the males (because [4.75 – 2.70] ÷ 2.70 = 0.76).