How to Use Frequencies or Densities with Your Data in R
How to Determine a Data Structure in R
How to Use Apply to Create Tabular Summaries in R

How to Remove Duplicate Data in R

A very useful application of subsetting data is to find and remove duplicate values. R has a useful function, duplicated(), that finds duplicate values and returns a logical vector that tells you whether the specific value is a duplicate of a previous value. This means that for duplicated values, duplicated() returns FALSE for the first occurrence and TRUE for every following occurrence of that value, as in the following example:

> duplicated(c(1,2,1,3,1,4))
[1] FALSE FALSE TRUE FALSE TRUE FALSE

If you try this on a data frame, R automatically checks the observations (meaning, it treats every row as a value). So, for example, with the data frame iris:

> duplicated(iris)
 [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
 [10] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
....
 [136] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
[145] FALSE FALSE FALSE FALSE FALSE FALSE

If you look carefully, you notice that row 143 is a duplicate (because the 143rd element of your result has the value TRUE). You also can tell this by using the which() function:

> which(duplicated(iris))
[1] 143

Now, to remove the duplicate from iris, you need to exclude this row from your data. Remember that there are two ways to exclude data using subsetting:

  • Specify a logical vector, where FALSE means that the element will be excluded. The ! (exclamation point) operator is a logical negation. This means that it converts TRUE into FALSE and vice versa. So, to remove the duplicates from iris, you do the following:

    > iris[!duplicated(iris), ]
  • Specify negative values. In other words:

> index <- which(duplicated(iris))
> iris[-index, ]

In both cases, you’ll notice that your instruction has removed row 143.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
How to Create Vectors in R
How to Do More with Loops in R
How to Remove Rows with Missing Data in R
How to Manipulate Files in R
How to Add a Second Dimension in R
Advertisement

Inside Dummies.com