In the R programming language, a conversion from a matrix to a data frame can’t be used to construct a data frame with different types of values. If you combine both numeric and character data in a matrix, for example, everything will be converted to character.

You can construct a data frame from scratch, though, using the data.frame() function. Once a data frame is created, you can add observations to a data frame.

Make a data frame from vectors in R

So, let’s make a little data frame with the names, salaries, and starting dates of a few imaginary co-workers. First, you create three vectors that contain the necessary information like this:
> employee <- c('John Doe','Peter Gynn','Jolie Hope')
> salary <- c(21000, 23400, 26800)
> startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
Now you have three different vectors in your workspace:
  • A character vector called employee, containing the names

  • A numeric vector called salary, containing the yearly salaries

  • A date vector called startdate, containing the dates on which the co-workers started

Next, you combine the three vectors into a data frame using the following code:
> employ.data <- data.frame(employee, salary, startdate)
The result of this is a data frame, employ.data, with the following structure:
> str(employ.data)
'data.frame': 3 obs. of 3 variables:
 $ employee : Factor w/ 3 levels "John Doe","Jolie Hope",..: 1 3 2
 $ salary  : num 21000 23400 26800
 $ startdate: Date, format: "2010-11-01" "2008-03-25" ...

To combine a number of vectors into a data frame, you simply add all vectors as arguments to the data.frame() function, separated by commas. R will create a data frame with the variables that are named the same as the vectors used.

Keep characters as characters in R

You may have noticed something odd when looking at the structure of employ.data. Whereas the vector employee is a character vector, R made the variable employee in the data frame a factor.

R does this by default, but you have an extra argument to the data.frame() function that can avoid this — namely, the argument stringsAsFactors. In the employ.data example, you can prevent the transformation to a factor of the employee variable by using the following code:

> employ.data <- data.frame(employee, salary, startdate, stringsAsFactors=FALSE)
If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:
> str(employ.data)
'data.frame': 3 obs. of 3 variables:
 $ employee : chr "John Doe" "Peter Gynn" "Jolie Hope"
 $ salary  : num 21000 23400 26800
 $ startdate: Date, format: "2010-11-01" "2008-03-25" ...

By default, R always transforms character vectors to factors when creating a data frame with character vectors or converting a character matrix to a data frame. This can be a nasty cause of errors in your code if you’re not aware of it. If you make it a habit to always specify the stringsAsFactors argument, you can avoid a lot of frustration.

About This Article

This article is from the book:

About the book authors:

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R. Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.

This article can be found in the category: