Regression Analysis in Statistical Analysis of Big Data

By Alan Anderson, David Semmelroth

Regression analysis is used to estimate the strength and direction of the relationship between variables that are linearly related to each other. Two variables X and Y are said to be linearly related if the relationship between them can be written in the form

Y = mX + b

where

m is the slope, or the change in Y due to a given change in X

b is the intercept, or the value of Y when X = 0

As an example of regression analysis, suppose a corporation wants to determine whether its advertising expenditures are actually increasing profits, and if so, by how much. The corporation gathers data on advertising and profits for the past 20 years and uses this data to estimate the following equation:

Y = 50 + 0.25X

where

Y represents the annual profits of the corporation (in millions of dollars).

X represents the annual advertising expenditures of the corporation (in millions of dollars).

In this equation, the slope equals 0.25, and the intercept equals 50. Because the slope of the regression line is 0.25, this indicates that on average, for every $1 million increase in advertising expenditures, profits rise by $.25 million, or $250,000. Because the intercept is 50, this indicates that with no advertising, profits would still be $50 million.

This equation, therefore, can be used to forecast future profits based on planned advertising expenditures. For example, if the corporation plans on spending $10 million on advertising next year, its expected profits will be as follows:

Y = 50 + 0.25X

Y = 50 + 0.25(10) = 50 + 2.5 + 52.5

Hence, with an advertising budget of $10 million next year, profits are expected to be $52.5 million.