Statistics for Big Data For Dummies

Unlike a stem-and-leaf plot, a scatter plot is intended to show the relationship between two variables. It may be difficult to see whether there's a relationship between two variables just by looking at the raw data, but with a scatter plot, any patterns that exist in the data become much easier to see.

A scatter plot consists of a series of points; each point shows a single value for two different variables. For example, you could construct a scatter plot to show the relationship between a corporation's annual revenues and its annual profits. If you're trying to predict profits based on revenue, the X-axis would be used to show annual revenues, and the Y-axis would be used to show annual profits. So, revenues are considered to be the independent variable, and profits are considered to be the dependent variable in this relationship.

On a scatter plot, the X-axis (that is, the horizontal axis) is used to show the independent variable, and the Y-axis (the vertical axis) is used to show the dependent variable.

In this example, each point on the scatter plot shows the revenues and sales for a specified year. The table shows the relationship between the annual revenues and annual profits of a corporation during the years 2004–2014.

Annual Revenues and Profits 2004–2014
Year Revenues (\$ millions) Profits (\$ millions)
2004 225 42
2005 237 43
2006 245 48
2007 222 40
2008 265 60
2009 270 56
2010 254 53
2011 280 60
2012 290 62
2013 305 65
2014 312 71

The following figure shows the resulting scatter plot.

Scatter plot of annual revenues and profits 2004–2014.

Each point on the scatter plot represents the revenues and related profits of the corporation for a single year. For example, the point in the upper right-hand corner of the plot represents data for 2014, when the corporation's revenues were \$312 million and the profits were \$71 million.

You can use a scatter plot to determine whether

• The two variables tend to move in the same direction.

• The two variables tend to move in opposite directions.

• The two variables aren't related to each other.

If two variables tend to move in the same direction, they are said to be positively correlated; if they tend to move in opposite directions, they are said to be negatively correlated. If two variables don't show any particular pattern, they are said to be uncorrelated.

This figure shows a scatter plot for two variables (X and Y) that are positively correlated.

Scatter plot of two positively correlated variables.

The scatter plot shows that as X increases, there's a strong tendency for Y to increase (but not necessarily by the same amount). This shows that X and Y are positively correlated.

The following figure shows the same scatter plot with a trend line; the equation of this line is estimated with regression analysis.

Scatter plot of two positively correlated variables with a trend line.

The trend line shows how much Y changes on average, given a specific change in X. A positively sloped trend line indicates that two variables are positively correlated; similarly, a negatively sloped trend line indicates that two variables are negatively correlated. If a trend line is flat (that is, has a zero slope), this indicates that the two variables are unrelated to each other. The positively sloped trend line shows that X and Y are positively correlated.

The following figure shows a scatter plot and the trend line for two variables that are negatively correlated.

Scatter plot of two negatively correlated variables with a trend line.

The scatter plot shows that as X increases, Y tends to decrease; the trend line has a negative slope. Therefore, X and Y are negatively correlated.

The following figure shows a scatter plot and the trend line for two variables that are uncorrelated.

Scatter plot of two uncorrelated variables with a trend line.

The scatter plot shows that as X increases, Y sometimes increases and sometimes decreases. There is no particular pattern to the data. The points appear to be randomly scattered throughout the diagram. As a result, the trend line is nearly flat, and this shows that X and Y are uncorrelated.

For a real-world example, this figure shows a scatter plot of the relationship between the price of Apple stock and the Standard and Poor's 500 stock market index from January 1, 2013 to December 31, 2013.

Scatter plot of daily prices of Apple stock and the S&P 500.

The Standard and Poor's 500 (S&P 500) index is an average of the prices of the 500 largest U.S. stocks. The value of this index reflects the overall state of the U.S. economy. The plot shows that there's a positive correlation between the price of Apple stock and the S&P 500.