How to Manipulate Files in R
How to Use the Apply Family of Functions in R
How to Find Help Using R Mailing Lists

How to Deal with Missing Data Values in R

The cor() function in R can deal with missing data values in multiple ways. For that, you set the argument use to one of the possible text values. The value for the use argument is especially important if you calculate the correlations of the variables in a data frame. By setting this argument to different values, you can

  • Use all observations by setting use='everything'. This means that if there’s any NA value in one of the variables, the resulting correlation is NA as well. This is the default.

  • Exclude all observations that have NA for at least one variable. For this, you set use='complete.obs'. Note that this may leave you with only a few observations if missing values are spread through the complete dataset.

  • Exclude observations with NA values for every pair of variables you examine. For that, you set the argument use='pairwise'. This ensures that you can calculate the correlation for every pair of variables without losing information because of missing values in the other variables.

In fact, you can calculate different measures of correlation. By default, R calculates the standard Pearson correlation coefficient. For data that is not normally distributed, you can use the cor() function to calculate the Spearman rank correlation, or Kendall’s tau. For this, you have to set the method argument to the appropriate value.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
How to Evaluate Linear Data with R
How to Traverse Data with Apply Functions in R
How to Make Scatterplot and Line Charts in R with ggplot2
How to Look at Data Margins and Proportions in R
How to Navigate RGui