# How to Utilize Linear Regressions in Predictive Analytics

*Linear regression* is a statistical method that analyzes and finds relationships between two variables. In predictive analytics it can be used to predict a future numerical value of a variable.

Consider an example of data that contains two variables: past data consisting of the arrival times of a train and its corresponding delay time. Suppose you want to predict what the delay would be for the next train. If you apply linear regression to these two variables — the arrival and delay times — you can generate a linear equation such as

Delay = a + (b * Arrival time) + d

This equation expresses the relationship between delay time and arrival time. The constants *a* and *b* are the model’s parameters. The variable *d* is the *error term* (also known as the *remainder*) — a numerical value that represents the mismatch between the two variables *delay* and *arrival time*. If the error is not equal to zero, then that might indicate that there are criteria affecting the variable *delay*.

If you’re sitting at the train station, you can simply plug the arrival time into the preceding equation and you can compute the expected delay, using the linear regression model’s given parameters *a, b,* and *d*.

Linear regression is (as you might imagine) most suitable for linear data. But it’s very sensitive toward outliers in the data points. The outliers in your data can have a significant impact on the model. It is recommended that you remove those outliers from the training set if you’re planning to use linear regression for your predictive model.