Using Linear Regression to Predict an Outcome

Deborah J. Rumsey

Updated

2021-12-21 20:36:48

From the book

Statistics For Dummies

Download E-Book

Probability Workbook For Dummies

Explore Book

Download E-Book

Probability Workbook For Dummies

Explore Book

Statistical researchers often use a linear relationship to predict the (average) numerical value of Y for a given value of X using a straight line (called the regression line).

If you know the slope and the y-intercept of that regression line, then you can plug in a value for X and predict the average value for Y. In other words, you predict (the average) Y from X.

If you establish at least a moderate correlation between X and Y through both a correlation coefficient and a scatterplot, then you know they have some type of linear relationship.

Never do a regression analysis unless you have already found at least a moderately strong correlation between the two variables. (A good rule of thumb is it should be at or beyond either positive or negative 0.50.) If the data don’t resemble a line to begin with, you shouldn’t try to use a line to fit the data and make predictions (but people still try).

Before moving forward to find the equation for your regression line, you have to identify which of your two variables is X and which is Y. When doing correlations, the choice of which variable is X and which is Y doesn’t matter, as long as you’re consistent for all the data. But when fitting lines and making predictions, the choice of X and Y does make a difference.

So how do you determine which variable is which? In general, Y is the variable that you want to predict, and X is the variable you are using to make that prediction. For example, say you are using the number of times a population of crickets chirp to predict the temperature. In this case you would make the variable Y the temperature, and the variable X the number of chirps. Hence Y can be predicted by X using the equation of a line if a strong enough linear relationship exists.

Statisticians call the X-variable (cricket chirps in this example) the explanatory variable, because if X changes, the slope tells you (or explains) how much Y is expected to change in response. Therefore, the Y variable is called the response variable. Other names for X and Y include the independent and dependent variables, respectively.

In the case of two numerical variables, you can come up with a line that enables you to predict Y from X, if (and only if) the following two conditions are met:

The scatterplot must form a linear pattern.
The correlation, r, is moderate to strong (typically beyond 0.50 or –0.50).

Some researchers actually don’t check these conditions before making predictions. Their claims are not valid unless the two conditions are met.

But suppose the correlation is high; do you still need to look at the scatterplot? Yes. In some situations the data have a somewhat curved shape, yet the correlation is still strong; in these cases making predictions using a straight line is still invalid. Predictions in these cases need to be made based on other methods that use a curve instead.

About This Article

About the book author:

Deborah J. Rumsey, PhD, is an Auxiliary Professor and Statistics Education Specialist at The Ohio State University. She is the author of Statistics For Dummies, Statistics II For Dummies, Statistics Workbook For Dummies, and Probability For Dummies.

This article can be found in the category:

Statistics

Hot off the press

Explore Related content

Probability Workbook For Dummies

Statistics All-in-One For Dummies

Statistics Essentials For Dummies

Statistics II For Dummies

Statistics: 1001 Practice Problems For Dummies (+ Free Online Practice)

Statistics Workbook For Dummies with Online Practice

Statistics For Dummies

Probability For Dummies

Biostatistics For Dummies

Book & Article Categories

Book & Article Categories

Collections

Using Linear Regression to Predict an Outcome

About This Article

About the book author:

This article can be found in the category:

Explore Related content

Book & Article Categories

Book & Article Categories

Collections

Using Linear Regression to Predict an Outcome

About This Article

This article is from the book:

About the book author:

This article can be found in the category:

Explore Related content

Statistics All-in-One For Dummies Cheat Sheet

10 Steps to a Better Math Grade with Statistics

Statistics and Histograms

What is Categorical Data and How is It Summarized?

Statistics II For Dummies Cheat Sheet

SPSS For Dummies Cheat Sheet

Statistics Workbook For Dummies Cheat Sheet

Probability For Dummies Cheat Sheet

Statistics For Dummies Cheat Sheet

Statistics: 1001 Practice Problems For Dummies Cheat Sheet

Statistics Conundrums: Dealing with Survey Nonresponders

Generalizing Statistical Results to the Entire Population

Figuring Out What Probability Means

Using Probability When Hitting the Slot Machines

Statistical Standard Scores and Standard Normal Distributions — The &#147;Z-Table&#148;

Statistical T-Distribution — The “T-Table”

Discrete Probability Distributions

Principles of Probability

Continuous Probability Distributions

Statistically Figuring Sample Size

Statistical Standard Scores and Standard Normal Distributions — The Z-Table