Connecting Conditional Probabilities to Two-Way Tables

By Consumer Dummies

You can use probabilities from a two-way table to look for and describe relationships between two categorical variables. The following table displays information about cigarette smoking and diagnosis with hypertension for a group of patients at a medical clinic.

[Credit: Illustration by Ryan Sneed]

Credit: Illustration by Ryan Sneed

The following stacked bar graph displays the smoking and hypertension data for the group of patients at the medical clinic.

[Credit: Illustration by Ryan Sneed]

Credit: Illustration by Ryan Sneed

This graph shows two bars, one for the group with a hypertension diagnosis, and one for the group with no hypertension diagnosis. The total percentages within each bar sum to 100%. Any percentage within a group represents a conditional probability for that group — that is, the percentage within that group with a certain characteristic.

Use conditional probability terms and notation to answer these problems.

Sample questions

  1. What does the area labeled 35% represent?

    Answer: the conditional probability of not smoking, given a hypertension diagnosis

    The 35% portion is in the bar labeled Hypertension Diagnosis, which sums to 100%, so this area is a conditional probability for those who have a hypertension diagnosis. You know that in the data from the table, among people with hypertension, smoking is more common than not smoking. So the smaller area of this bar must be the conditional probability of not smoking, given a hypertension diagnosis.

    You can also calculate the conditional probability of not smoking, given a hypertension diagnosis, by dividing the number of respondents who don’t smoke and have a hypertension diagnosis (26) by the total number with a hypertension diagnosis (74):

    image2.jpg

    This probability notation has hypertension in the back part of the parentheses because that’s the subgroup you’re looking at (and why you divide by 74). The nonsmoker goes in the front part of the parentheses because you want to know what proportion of that subgroup are nonsmokers.

  2. What does the area labeled 68% represent?

    Answer: the conditional probability of not smoking, given no hypertension diagnosis

    The 68% portion is in the bar labeled No Hypertension Diagnosis, which sums to 100%, so this area is a conditional probability for those who don’t have a hypertension diagnosis. You know that in the data from the table, among people with no hypertension diagnosis, not smoking is more common than smoking. So the larger area of this bar must be the conditional probability of not smoking, given no hypertension diagnosis.

    You can also calculate the conditional probability of not smoking, given no hypertension diagnosis, by dividing the number of respondents who don’t smoke and don’t have a hypertension diagnosis (50) by the total number who don’t have a hypertension diagnosis (74):

    image3.jpg

    This probability notation has no hypertension in the back part of the parentheses because that’s the subgroup you’re looking at (and why you divide by 74). The nonsmoker goes in the front part of the parentheses because you want to know what proportion of that subgroup are nonsmokers.

  3. Based on this data, and understanding that you are working with only a single sample of data, which of the following statements appears to be true?

    A. Patients with a hypertension diagnosis are more likely to be smokers than nonsmokers.

    B. Patients with a hypertension diagnosis are less likely to be smokers than nonsmokers.

    C. Patients without a hypertension diagnosis are more likely to be smokers than nonsmokers.

    D. Patients without a hypertension diagnosis are more likely to be nonsmokers than smokers.

    E. Choices (A) and (D)

    Answer: E. Choices (A) and (D) (Patients with a hypertension diagnosis are more likely to be smokers than nonsmokers; patients without a hypertension diagnosis are more likely to be nonsmokers than smokers.)

    Although you don’t want to generalize too broadly from a single sample of data, the patterns found in this data set indicate that patients with a hypertension diagnosis are more likely to be smokers (65%) rather than nonsmokers (35%), and patients without a hypertension diagnosis are more likely to be nonsmokers (68%) than smokers (32%).

    You can find these percentages in the two bar graphs.

If you need more practice on this and other topics from your statistics course, visit 1,001 Statistics Practice Problems For Dummies to purchase online access to 1,001 statistics practice problems! We can help you track your performance, see where you need to study, and create customized problem sets to master your stats skills.