# Regression Analysis in Statistical Analysis of Big Data

*Regression* *analysis* is used to estimate the strength and direction of the relationship between variables that are *linearly* related to each other. Two variables *X* and *Y* are said to be *linearly* related if the relationship between them can be written in the form

Y=mX+b

where

mis theslope,or the change inYdue to a given change inX

bis theintercept,or the value ofYwhenX= 0

As an example of regression analysis, suppose a corporation wants to determine whether its advertising expenditures are actually increasing profits, and if so, by how much. The corporation gathers data on advertising and profits for the past 20 years and uses this data to estimate the following equation:

Y= 50 + 0.25X

where

Yrepresents the annual profits of the corporation (in millions of dollars).

Xrepresents the annual advertising expenditures of the corporation (in millions of dollars).

In this equation, the slope equals 0.25, and the intercept equals 50. Because the slope of the regression line is 0.25, this indicates that on average, for every $1 million increase in advertising expenditures, profits rise by $.25 million, or $250,000. Because the intercept is 50, this indicates that with no advertising, profits would still be $50 million.

This equation, therefore, can be used to forecast future profits based on planned advertising expenditures. For example, if the corporation plans on spending $10 million on advertising next year, its expected profits will be as follows:

Y= 50 + 0.25X

Y= 50 + 0.25(10) = 50 + 2.5 + 52.5

Hence, with an advertising budget of $10 million next year, profits are expected to be $52.5 million.