With R at your fingertips, you can quickly shape your data exactly as you want it. That’s good because in many real-life cases, you get heaps of data in a big file, and preferably in a format you can’t use at all. That must be the golden rule of data gathering: Make sure your statistician sweats his pants off just by looking at the data.

Selecting only the variables you need and transforming them to the right format becomes pretty easy with tricks in R.

Let’s prepare the data frame mtcars a bit using some simple tricks. First, create a data frame cars like this:

> cars <- mtcars[c(1,2,9,10)]
> cars$gear <- ordered(cars$gear)
> cars$am <- factor(cars$am, labels=c('auto', 'manual'))

With this code, you do the following:

  • Select four variables from the data frame mtcars and save them in a data frame called cars. Note that you use the index system for lists to select the variables.

  • Make the variable gear in this data frame an ordered factor.

  • Give the variable am the value 'auto' if its original value is 1, and 'manual' if its original value is 0.

  • Transform the new variable am to a factor.

In the conversion of cars$am, you notice that the first argument of the ifelse() statement isn’t a logical expression. The original variable has 0 and 1 as values, and R reads a 0 as FALSE and everything else as TRUE. You can use this property in your own code, as shown earlier.

After running this code, you should have a dataset cars in your workspace with the following structure:

> str(cars)
'data.frame': 32 obs. of 4 variables:
 $ mpg : num 21 21 22.8 21.4 18.7 ...
 $ cyl : num 6 6 4 6 8 ...
 $ am : Factor w/ 2 levels "auto","manual": 1 1 1 2 2 ...
 $ gear: Ord.factor w/ 3 levels "3"<"4"<"5": 2 2 2 1 1 ...