Regression Analysis in Statistical Analysis of Big Data

Statistics for Big Data For Dummies

Regression analysis is used to estimate the strength and direction of the relationship between variables that are linearly related to each other. Two variables X and Y are said to be linearly related if the relationship between them can be written in the form

Y = mX + b

where

m is the slope, or the change in Y due to a given change in X

b is the intercept, or the value of Y when X = 0

As an example of regression analysis, suppose a corporation wants to determine whether its advertising expenditures are actually increasing profits, and if so, by how much. The corporation gathers data on advertising and profits for the past 20 years and uses this data to estimate the following equation:

Y = 50 + 0.25X

where

Y represents the annual profits of the corporation (in millions of dollars).

X represents the annual advertising expenditures of the corporation (in millions of dollars).

In this equation, the slope equals 0.25, and the intercept equals 50. Because the slope of the regression line is 0.25, this indicates that on average, for every $1 million increase in advertising expenditures, profits rise by $.25 million, or $250,000. Because the intercept is 50, this indicates that with no advertising, profits would still be $50 million.

This equation, therefore, can be used to forecast future profits based on planned advertising expenditures. For example, if the corporation plans on spending $10 million on advertising next year, its expected profits will be as follows:

Y = 50 + 0.25X

Y = 50 + 0.25(10) = 50 + 2.5 + 52.5

Hence, with an advertising budget of $10 million next year, profits are expected to be $52.5 million.

About This Article

About the book author:

Alan Anderson, PhD is a teacher of finance, economics, statistics, and math at Fordham and Fairfield universities as well as at Manhattanville and Purchase colleges. Outside of the academic environment he has many years of experience working as an economist, risk manager, and fixed income analyst. Alan received his PhD in economics from Fordham University, and an M.S. in financial engineering from Polytechnic University.

David Semmelroth has two decades of experience translating customer data into actionable insights across the financial services, travel, and entertainment industries. David has consulted for Cedar Fair, Wachovia, National City, and TD Bank.