How to Calculate a Regression Line
In statistics, you can calculate a regression line for two variables if their scatterplot shows a linear pattern and the correlation between the variables is very strong (for example, r = 0.98). A regression line is simply a single line that best fits the data (in terms of having the smallest overall distance from the line to the points). Statisticians call this technique for finding the bestfitting line a simple linear regression analysis using the least squares method.
The formula for the bestfitting line (or regression line) is y = mx + b, where m is the slope of the line and b is the yintercept. This equation itself is the same one used to find a line in algebra; but remember, in statistics the points don’t lie perfectly on a line — the line is a model around which the data lie if a strong linear pattern exists.

The slope of a line is the change in Y over the change in X. For example, a slope of
means as the xvalue increases (moves right) by 3 units, the yvalue moves up by 10 units on average.

The yintercept is the value on the yaxis where the line crosses. For example, in the equation y=2x – 6, the line crosses the yaxis at the value b= –6. The coordinates of this point are (0, –6); when a line crosses the yaxis, the xvalue is always 0.
You may be thinking that you have to try lots and lots of different lines to see which one fits best. Fortunately, you have a more straightforward option (although eyeballing a line on the scatterplot does help you think about what you’d expect the answer to be). The bestfitting line has a distinct slope and yintercept that can be calculated using formulas (and these formulas aren’t too hard to calculate).
To save a great deal of time calculating the best fitting line, first find the “big five,” five summary statistics that you’ll need in your calculations:

The mean of the x values

The mean of the y values

The standard deviation of the x values (denoted s_{x})

The standard deviation of the y values (denoted s_{y})

The correlation between X and Y (denoted r)
Finding the slope of a regression line
The formula for the slope, m, of the bestfitting line is
where r is the correlation between X and Y, and s_{x} and s_{y} are the standard deviations of the xvalues and the yvalues, respectively. You simply divide s_{y} by s_{x} and multiply the result by r.
Note that the slope of the bestfitting line can be a negative number because the correlation can be a negative number. A negative slope indicates that the line is going downhill. For example, if an increase in police officers is related to a decrease in the number of crimes in a linear fashion; then the correlation and hence the slope of the bestfitting line is negative in this case.
The correlation and the slope of the bestfitting line are not the same. The formula for slope takes the correlation (a unitless measurement) and attaches units to it. Think of s_{y} divided by s_{x} as the variation (resembling change) in Y over the variation in X, in units of X and Y. For example, variation in temperature (degrees Fahrenheit) over the variation in number of cricket chirps (in 15 seconds).
Finding the yintercept of a regression line
The formula for the yintercept, b, of the bestfitting line is
are the means of the xvalues and the yvalues, respectively, and m is the slope.
So to calculate the yintercept, b, of the bestfitting line, you start by finding the slope, m, of the bestfitting line using the above steps. Then to find the yintercept, you multiply m by
Always calculate the slope before the yintercept. The formula for the yintercept contains the slope!