How to Calculate Data Correlations in R - dummies

How to Calculate Data Correlations in R

By Andrie de Vries, Joris Meys

The amount in which two data variables vary together can be described by the correlation coefficient. In R, you get the correlations between a set of variables very easily by using the cor() function. You simply add the two variables you want to examine as the arguments. For example, if you want to check how much the petal width correlates with the petal length, you simply do the following:

> with(iris, cor(Petal.Width, Petal.Length))
[1] 0.9628654

This tells you that the relation between the petal width and the petal length is almost a perfect line, as you also can see in the fourth plot of the third row.

You also can calculate the correlation among multiple variables at once, much in the same way as you can plot the relations among multiple variables. So, for example, you can calculate the correlations that correspond with the plot with the following line:

> iris.cor <- cor(iris[-5])

As always, you can save the outcome of this function in an object. This lets you examine the structure of the function output so you can figure out how you can use it in the rest of your code. Here’s a look at the structure of the object iris.cor:

> str(iris.cor)
 num [1:4, 1:4] 1 -0.118 0.872 0.818 -0.118 ...
 - attr(*, "dimnames")=List of 2
 ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"
 ..$ : chr [1:4] "Sepal.Length" "Sepal.Width" "Petal.Length" "Petal.Width"

This output tells you that iris.cor is a matrix with the names of the variables as both row names and column names. To find the correlation of two variables in that matrix, you can use the names as indices — for example:

> iris.cor['Petal.Width', 'Petal.Length']
[1] 0.9628654