By Alan Anderson, David Semmelroth

Part of Statistics for Big Data For Dummies Cheat Sheet

Measures of association quantify the strength and the direction of the relationship between two data sets. Here are the two most commonly used measures of association:

  • Covariance

  • Correlation

Both measures are used to show how closely two data sets are related to each other. The main difference between them is the units in which they are measured. The correlation measure is defined to assume values between –1 and 1, which makes interpretation very easy.

Covariance

The covariance between two samples is computed as follows:

image0.jpg

The covariance between two populations is computed as follows:

image1.jpg

Correlation

The correlation between two samples is computed like this:

image2.jpg

The correlation between two populations is computed like this:

image3.jpg