How to Create a Regression Equation in Excel

By Jeff Sauro

You can create a regression equation in Excel that will help you predict customer values. To create a regression equation using Excel, follow these steps:

  1. Insert a scatterplot graph into a blank space or sheet in an Excel file with your data.

    You can find the scatterplot graph on the Insert ribbon in Excel 2007 and later.

  2. Select the x-axis (horizontal) and y-axis data and click OK.

    Put what you want to predict in the y-axis (so my time data is in column B). The taps are in column C.

    You now have a scatterplot.

  3. Right-click on any of the dots and select “Add Trendline” from the menu.

    The Format Trendline dialog box opens.

    image0.jpg

  4. Select Trendline Options on the left, if necessary, then select the Display Equation on Chart and Display R-Squared Value on Chart boxes.

    You now have a scatterplot with trendline, equation, and r-squared value. The regression equation is Y = 4.486x + 86.57.

    image1.jpg

    The r2 value of .3143 tells you that taps can explain around 31% of the variation in time. It tells you how well the best-fitting line actually fits the data.

Going beyond the ends of observed values is risky when using a regression equation. There’s no guarantee that the regression line will continue to be linear because it extends before and after the data points.

Watch out for the following three things when correlating customer analytics data and using regression analysis:

  • Range restriction: Two variables might have a low correlation because you’re only measuring in a narrow range. For example, height and weight have a strong positive correlation, but if you measure only National Basketball Association (NBA) players, the correlation would mostly go away. This can happen, for example, if you are looking at a narrow range of customers — say, the ones with the highest incomes or most transactions.

  • Third variables: It’s often the case that another variable you aren’t measuring is actually the cause of the relationship. For example, high school grades are correlated with college grades. It may seem like better studying in high school leads to better grades in college.

    However, it’s often the case that a third variable, Socio Economic Status (SES) is a better explanation of both high school and college grades. Students in families with higher SES tend to have higher grades in high school and college than students from families with low SES. In customer analytics an improving economy or a growing company may be the reason for increases in sales, and not your marketing campaign or feature changes.

  • Nonlinearity: The relationship between variables needs to be linear — that is, follow a line somewhat. If the relationship curves downward or upward, a correlation and regression equation will not properly describe the relationship.