How to Use Data Tables in R - dummies

How to Use Data Tables in R

By Andrie de Vries, Joris Meys

A first step in every analysis, using R or not, consists of calculating the descriptive statistics for your dataset. You have to get to know the data you received before you can accurately decide what models you try out on them.

You need to know something about the range of the values in your data, how these values are distributed in the range, and how values in different variables relate to each other. Much of what you do and how you do it depends on the type of data.

Whenever you have a limited number of different values, you can get a quick summary of the data by calculating a frequency table. A frequency table is a table that represents the number of occurrences of every unique value in the variable. In R, you use the table() function for that.

How to create a data table in R

You can tabulate, for example, the amount of cars with a manual and an automatic gearbox using the following command:

> amtable <- table(cars$am)
> amtable
 auto manual
  13   19

This outcome tells you that, in your data, there are 13 cars with an automatic gearbox and 19 with a manual gearbox.

How to work with data tables in R

As with most functions, you can save the output of table() in a new object (in this case, called amtable). At first sight, the output of table() looks like a named vector, but is it?

> class(amtable)
[1] "table"

The table() function generates an object of the class table. These objects have the same structure as an array. Arrays can have an arbitrary number of dimensions and dimension names. Tables can be treated as arrays to select values or dimension names.