How Correlation, Regression, and Two-Way Tables Clarify Statistical Data

By Deborah J. Rumsey

One of the most common goals of statistical research is to find links between variables. Using correlation, regression, and two-way tables, you can use data to answer questions like these:

  • Which lifestyle behaviors increase or decrease the risk of cancer?

  • What are the number of side effects associated with this new drug?

  • Can I lower my cholesterol by taking this new herbal supplement?

  • Does spending a large amount of time on the Internet cause a person to gain weight?

Finding links between variables is what helps the medical world design better drugs and treatments, provides marketers with info on who is more likely to buy their products, and gives politicians information on which to build arguments for and against certain policies.

In the mega-business of looking for relationships between variables, you find an incredible number of statistical results — but can you tell what’s correct and what’s not? Many important decisions are made based on these studies, and it’s important to know what standards need to be met in order to deem the results credible, especially when a cause-and-effect relationship is being reported.

This is why you need to know how to

  • plot data from two numerical variables (such as dosage level and blood pressure);

  • find and interpret correlation (the strength and direction of the linear relationship between x and y);

  • find the equation of a line or curve that best fits the data (and when doing so is appropriate); and

  • use these results to make predictions for one variable based on another (called regression).

You also need to recognize when a line fits the data well and when it doesn’t, and what conclusions you can make (and shouldn’t make) in the situations where a line does fit.

It is useful to look for and describe links between two categorical variables (such as the number of doses taken per day and the presence or absence of nausea). You do this by collecting and organizing data into two-way tables (where the possible values of one variable make up the rows and the possible values for the other variable make up the columns), interpreting the results, analyzing the data from two-way tables to look for relationships, and checking for independence.