How to Create a Data Frame from Scratch in R
The conversion from a matrix to a data frame in R can’t be used to construct a data frame with different types of values. If you combine both numeric and character data in a matrix for example, everything will be converted to character.
You can construct a data frame from scratch, though, using the data.frame() function. Once a data frame is created, you can add observations to a data frame.
Make a data frame from vectors in R
So, let’s make a little data frame with the names, salaries, and starting dates of a few imaginary co-workers. First, you create three vectors that contain the necessary information like this:
> employee <- c('John Doe','Peter Gynn','Jolie Hope') > salary <- c(21000, 23400, 26800) > startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14'))
Now you have three different vectors in your workspace:
A character vector called employee, containing the names
A numeric vector called salary, containing the yearly salaries
A date vector called startdate, containing the dates on which the contracts started
Next, you combine the three vectors into a data frame using the following code:
> employ.data <- data.frame(employee, salary, startdate)
The result of this is a data frame, employ.data, with the following structure:
> str(employ.data) 'data.frame': 3 obs. of 3 variables: $ employee : Factor w/ 3 levels "John Doe","Jolie Hope",..: 1 3 2 $ salary : num 21000 23400 26800 $ startdate: Date, format: "2010-11-01" "2008-03-25" ...
To combine a number of vectors into a data frame, you simple add all vectors as arguments to the data.frame() function, separated by commas. R will create a data frame with the variables that are named the same as the vectors used.
Keep characters as characters in R
You may have noticed something odd when looking at the structure of employ.data. Whereas the vector employee is a character vector, R made the variable employee in the data frame a factor.
R does this by default, but you have an extra argument to the data.frame() function that can avoid this — namely, the argument stringsAsFactors. In the employ.data example, you can prevent the transformation to a factor of the employee variable by using the following code:
> employ.data <- data.frame(employee, salary, startdate, stringsAsFactors=FALSE)
If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output:
> str(employ.data) 'data.frame': 3 obs. of 3 variables: $ employee : chr "John Doe" "Peter Gynn" "Jolie Hope" $ salary : num 21000 23400 26800 $ startdate: Date, format: "2010-11-01" "2008-03-25" ...
By default, R always transforms character vectors to factors when creating a data frame with character vectors or converting a character matrix to a data frame. This can be a nasty cause of errors in your code if you’re not aware of it. If you make it a habit to always specify the stringsAsFactors argument, you can avoid a lot of frustration.