How to Remove Rows with Missing Data in R
Another useful application of subsetting data frames is to find and remove rows with missing data. The R function to check for this is complete.cases(). You can try this on the built-in dataset airquality, a data frame with a fair amount of missing data:
> str(airquality) > complete.cases(airquality)
The results of complete.cases() is a logical vector with the value TRUE for rows that are complete, and FALSE for rows that have some NA values. To remove the rows with missing data from airquality, try the following:
> x <- airquality[complete.cases(airquality), ] > str(x)
Your result should be a data frame with 111 rows, rather than the 153 rows of the original airquality data frame.
As always with R, there is more than one way of achieving your goal. In this case, you can make use of na.omit() to omit all rows that contain NA values:
> x <- na.omit(airquality)
When you’re certain that your data is clean, you can start to analyze it by adding calculated fields.
If you use any of these methods to subset your data or clean out missing values, remember to store the result in a new object. R doesn’t change anything in the original data frame unless you explicitly overwrite it. That’s a good thing, because you can’t accidently mess up your data.