How to Deal with Missing Data Values in R - dummies

How to Deal with Missing Data Values in R

By Andrie de Vries, Joris Meys

The cor() function in R can deal with missing data values in multiple ways. For that, you set the argument use to one of the possible text values. The value for the use argument is especially important if you calculate the correlations of the variables in a data frame. By setting this argument to different values, you can

  • Use all observations by setting use=’everything’. This means that if there’s any NA value in one of the variables, the resulting correlation is NA as well. This is the default.

  • Exclude all observations that have NA for at least one variable. For this, you set use=’complete.obs’. Note that this may leave you with only a few observations if missing values are spread through the complete dataset.

  • Exclude observations with NA values for every pair of variables you examine. For that, you set the argument use=’pairwise’. This ensures that you can calculate the correlation for every pair of variables without losing information because of missing values in the other variables.

In fact, you can calculate different measures of correlation. By default, R calculates the standard Pearson correlation coefficient. For data that is not normally distributed, you can use the cor() function to calculate the Spearman rank correlation, or Kendall’s tau. For this, you have to set the method argument to the appropriate value.