You can create a data frame from a matrix in R. Take a look at the number of baskets scored by Granny and her friend Geraldine. If you create a matrix baskets.team with the number of baskets for both ladies, you get this:

> baskets.team
           [,1] [,2] [,3] [,4] [,5] [,6]
baskets.of.Granny   12  4  5  6  9  3
baskets.of.Geraldine  5  4  2  4  12  9

It makes sense to make this matrix a data frame with two variables: one for Granny’s baskets and one for Geraldine’s baskets.

Using the function as.data.frame

To convert the matrix baskets.team into a data frame, you use the function as.data.frame():

> baskets.df <- as.data.frame(t(baskets.team))

You don’t have to use the transpose function, t(), to create a data frame, but in the example you want each player to be a separate variable. With data frames, each variable is a column, but in the original matrix, the rows represent the baskets for a single player. So, in order to get the desired result, you first have to transpose the matrix with t() before converting the matrix to a data frame with as.data.frame().

Looking at the structure of a data frame

If you take a look at the object, it looks exactly the same as the transposed matrix t(baskets.team):

> baskets.df
  Granny Geraldine
1st   12     5
2nd   4     4
3rd   5     2
4th   6     4
5th   9    12
6th   3     9

But there is a very important difference between the two: baskets.df is a data frame. This becomes clear if you take a look at the internal structure of the object, using the str() function:

> str(baskets.df)
‘data.frame’: 6 obs. of 2 variables:
 $ Granny  : num 12 4 5 6 9 3
 $ Geraldine: num 5 4 2 4 12 9

Now this starts looking more like a real dataset. You can see in the output that you have six observations and two variables. The variables are called Granny and Geraldine. It’s important to realize that each variable in itself is a vector. In this case, the output tells you that both variables are numeric.

Counting values and variables

To know how many observations a data frame has, you can use the nrow() function as you would with a matrix, like this:

> nrow(baskets.df)
[1] 6

Likewise, the ncol() function gives you the number of variables. But you can also use the length() function to get the number of variables for a data frame:

> length(baskets.df)
[1] 2

About This Article

This article is from the book:

About the book authors:

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R. Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.

This article can be found in the category: