Working with Statistical Two-Way Tables
To explore the links between two categorical variables, you first need to organize the data that’s been collected, and a table is a great way to do that. A two-way table classifies individuals into groups based on the outcomes, or distributions, of two categorical variables (for example, gender and opinion).
Suppose your local community developers are building a campground, and they’ve decided pets will be allowed as long as they’re on a leash. They are now trying to decide whether the campground should have a separate section for pets. You have a hunch that non–pet campers in the area may be more in favor of a separate pet area than pet campers, so you decide to find out what the members of the camping community think. You randomly select 100 campers from the local area and conduct a pet camping survey, recording each person’s opinion on having a pet section (yes, no) and if they camp with pets (yes, no). You now have a spreadsheet with 100 rows of data, one for each person you surveyed. Each row has two pieces of data: one column for whether the person is a pet camper (yes, no) and one column for that person’s opinion on having a pet section (support, oppose). Suppose the first 10 rows of your data set look like what’s shown in the below table.
|First 10 Rows of Data from the Pet Camping Survey|
|Person||Pet Camper?||Opinion on a Separate Pet Section|
From this small portion of your data set, you can start to break it down yourself. For example, looking at column 2 results, you see that half the respondents (5 divided by 10 = 0.50) camp with pets and the other half do not. Of those who camp with pets (that is, of those five people who have a yes in column 2), three of them (60%) support having a separate section; and the same results are true for non–pet campers. These results from these 10 campers likely don’t apply to all 100 campers surveyed; however, if you tried to examine the raw data from all 100 rows of this data set by hand, you wouldn’t make much progress in seeing patterns without a lot of hard work.
In order to get a handle on what’s happening in a large data set when you are examining two categorical variables, you organize your data into a two-way table. After you set up the two-way table, you can calculate percents to explore the data to answer your research questions. Here are some questions of interest from the above camping data:
What percentage of the campers are in favor of a pet section?
What percentage of the campers are pet campers who support a pet section?
Do a larger percentage of non–pet campers support a pet section, compared to pet campers?
The answers to these (and any other) questions about the data come from finding and working with the proportions, or percentages, of individuals within certain parts of the table. This process involves calculating and examining what statisticians call distributions. A distribution in the case of a two-way table is a list of all the possible outcomes for one variable or a combination of variables, along with their corresponding proportions (or percentages).
For example, the distribution for the pet camping variable lists the percentages of people who do and do not camp with pets. The distribution for the combination of the pet camping variable (yes, no) and the opinion variable (support, oppose) lists the percentages of: 1) pet campers who support a pet section; 2) pet campers who oppose a pet section; 3) non–pet campers who support a pet section; and 4) the non–pet campers who oppose a pet section.
For any distribution, all the percentages must sum to 100%. If you’re using proportions (decimals), they must sum to 1.00. Each individual has to be somewhere, and he can’t be in more than one place at one time. In some cases, the total of all proportions might not be exactly 1.00 (or 100%) due to rounding error. It is recommended that you extend the proportions out to 3 or 4 decimal places to get the best accuracy possible.
The following two-way table summarizes the results from all 100 campers surveyed.
|Two-Way Table of Pet Camping Survey Data
(All 100 Rows)
|Support Separate Pet Section||Oppose Separate Pet Section|
The table has 2 ∗ 2 = 4 numbers in it. These numbers represent the cells of the two-way table; each one represents an intersection of a row and column. The cell in the upper left corner of the table represents the 20 people who are pet campers supporting a pet section. In the upper right cell 10 people are pet campers opposing a pet section. In the lower left are the 55 non–pet campers who want a pet section; the 15 people in the lower right are non–pet campers opposing a pet section.