Advertisement
Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Count Unique Data Values in R

To figure out what data can be factored when working in R, let’s take a look at the dataset mtcars. This built-in dataset describes fuel consumption and ten different design points from 32 cars from the 1970s. It contains, in total, 11 variables, but all of them are numeric.

Although you can work with the data frame as is, some variables could be converted to a factor because they have a limited amount of values.

If you don’t know how many different values a variable has, you can get this information in two simple steps:

  1. Get the unique values of the variable using unique().

  2. Get the length of the resulting vector using length().

Using the sapply() function, you can do this for the whole data frame at once. You apply an anonymous function combining both mentioned steps on the whole data frame, like this:

> sapply(mtcars, function(x) length(unique(x)))
 mpg cyl disp  hp drat  wt qsec  vs  am gear carb
 25  3  27  22  22  29  30  2  2  3  6

So, it looks like the variables cyl, vs, am, gear, and carb can benefit from a conversion to factor.

You have 32 different observations in that dataset, so none of the variables has unique values only.

When to treat a variable like a factor depends a bit on the situation, but, as a general rule, avoid more than ten different levels in a factor and try to have at least five values per level.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win an iPad Mini. Enter to win now!