Advertisement
Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Remove Rows with Missing Data in R

Another useful application of subsetting data frames is to find and remove rows with missing data. The R function to check for this is complete.cases(). You can try this on the built-in dataset airquality, a data frame with a fair amount of missing data:

> str(airquality)
> complete.cases(airquality)

The results of complete.cases() is a logical vector with the value TRUE for rows that are complete, and FALSE for rows that have some NA values. To remove the rows with missing data from airquality, try the following:

> x <- airquality[complete.cases(airquality), ]
> str(x)

Your result should be a data frame with 111 rows, rather than the 153 rows of the original airquality data frame.

As always with R, there is more than one way of achieving your goal. In this case, you can make use of na.omit() to omit all rows that contain NA values:

> x <- na.omit(airquality)

When you’re certain that your data is clean, you can start to analyze it by adding calculated fields.

If you use any of these methods to subset your data or clean out missing values, remember to store the result in a new object. R doesn’t change anything in the original data frame unless you explicitly overwrite it. That’s a good thing, because you can’t accidently mess up your data.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win an iPad Mini. Enter to win now!