How to Create a Data Frame from Scratch in R

The conversion from a matrix to a data frame in R can’t be used to construct a data frame with different types of values. If you combine both numeric and character data in a matrix for example, everything will be converted to character.

You can construct a data frame from scratch, though, using the data.frame() function.

Make a data frame from vectors in R

So, let’s make a little data frame with the names, salaries, and starting dates of a few imaginary co-workers. First, you create three vectors that contain the necessary information like this:

> employee <- c('John Doe','Peter Gynn','Jolie Hope')
> salary <- c(21000, 23400, 26800)
> startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))

Now you have three different vectors in your workspace:

  • A character vector called employee, containing the names

  • A numeric vector called salary, containing the yearly salaries

  • A date vector called startdate, containing the dates on which the contracts started

Next, you combine the three vectors into a data frame using the following code:

> employ.data <- data.frame(employee, salary, startdate)

The result of this is a data frame, employ.data, with the following structure:

> str(employ.data)
'data.frame': 3 obs. of 3 variables:
 $ employee : Factor w/ 3 levels "John Doe","Jolie Hope",..: 1 3 2
 $ salary  : num 21000 23400 26800
 $ startdate: Date, format: "2010-11-01" "2008-03-25" ...

To combine a number of vectors into a data frame, you simple add all vectors as arguments to the data.frame() function, separated by commas. R will create a data frame with the variables that are named the same as the vectors used.

Keep characters as characters in R

You may have noticed something odd when looking at the structure of employ.data. Whereas the vector employee is a character vector, R made the variable employee in the data frame a factor.

R does this by default, but you have an extra argument to the data.frame() function that can avoid this — namely, the argument stringsAsFactors. In the employ.data example, you can prevent the transformation to a factor of the employee variable by using the following code:

> employ.data <- data.frame(employee, salary, startdate, stringsAsFactors=FALSE)

If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:

> str(employ.data)
'data.frame': 3 obs. of 3 variables:
 $ employee : chr "John Doe" "Peter Gynn" "Jolie Hope"
 $ salary  : num 21000 23400 26800
 $ startdate: Date, format: "2010-11-01" "2008-03-25" ...

By default, R always transforms character vectors to factors when creating a data frame with character vectors or converting a character matrix to a data frame. This can be a nasty cause of errors in your code if you’re not aware of it. If you make it a habit to always specify the stringsAsFactors argument, you can avoid a lot of frustration.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com