How to Determine a Data Structure in R

The first decision you have to make before analyzing your data is how to represent that data inside R. If your data has only one dimension, then you already know that vectors represent this type of data very well. However, if your data has more than one dimension, you have the choice of using matrices, lists, or data frames. So, the question is: When do you use which?

Matrices and higher-dimensional arrays are useful when all your data are of a single class — in other words, all your data are numeric or all your data are characters. If you’re a mathematician or statistician, you’re familiar with matrices and likely use this type of object very frequently.

But in many practical situations, you’ll have data that have many different classes — in other words, you’ll have a mixture of numeric and character data. In this case, you need to use either lists or data frames.

If you imagine your data as a single spreadsheet, a data frame is probably a good choice. Remember that a data frame is simply a list of named vectors of the same length, which is conceptually very similar to a spreadsheet with columns and a column heading for each.

If you’re familiar with databases, you can think of a data frame as similar to a single table in a database. Data frames are tremendously useful and, in many cases, will be your first choice of objects for storing your data.

If your data consists of a collection of objects but you can’t represent that as an array or a data frame, then a list is your ideal choice. Because lists can contain all kinds of other objects, including other lists or data frames, they’re tremendously flexible. Consequently, R has a wide variety of tools to process lists.

You may find that a data frame is a very suitable choice for most analysis and data-processing tasks. It’s a very convenient way of representing your data, and it’s similar to working with database tables. When you read data from a comma-separated value (CSV) file with the function read.csv() or read.table(), R puts the results in a data frame.

Object Description Comments
vector The basic data object in R, consisting of one or more values of a single type (for example, character, number, or integer). Think of this as a single column or row in a spreadsheet, or a column in a database table.
matrix or array A multidimensional object of a single type (known as atomic). A matrix is an array of two dimensions. When you have to store numbers in many dimensions, use arrays.
list Lists can contain objects of any type. Lists are very useful for storing collections of data that belong together. Because lists can contain lists, this type of object is very useful.
data.frame Data frames are a special kind of named list where all the elements have the same length. Data frames are similar to a single spreadsheet or to a table in a database.
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus

Inside Sweepstakes

Win $500. Easy.