Advertisement
Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Use read.csv() to Import Data in R

One of the easiest and most reliable ways of getting data into R is to use text files, in particular CSV (comma-separated values) files. The CSV file format uses commas to separate the different elements in a line, and each line of data is in its own line in the text file, which makes CSV files ideal for representing tabular data.

The additional benefit of CSV files is that almost any data application supports export of data to the CSV format. This is certainly the case for most spreadsheet applications, including Microsoft Excel and OpenOffice Calc.

In the following examples, assume that you have a CSV file stored in a convenient folder in your file system. To convert an Excel spreadsheet to CSV format, you need to choose File→Save As, which gives you the option to save your file in a variety of formats.

Keep in mind that a CSV file can represent only a single worksheet of a spreadsheet. Finally, be sure to use the topmost row of your worksheet (row 1) for the column headings.

In R, you use the read.csv() function to import data in CSV format. This function has a number of arguments, but the only essential argument is file, which specifies the location and filename. To read a file called elements.csv located at f: use read.csv() with file.path:

> elements <- read.csv(file.path("f:", "elements.csv"))
> str(elements)
'data.frame': 10 obs. of 9 variables:
 $ Atomic.number: int 1 2 3 4 5 6 7 8 9 10
 $ Name     : Factor w/ 10 levels "Beryllium","Boron",..: 6 5 7 1 2 3 9 10 4 8
 $ Symbol    : Factor w/ 10 levels "B","Be","C","F",..: 5 6 7 2 1 3 8 10 4 9
 $ Group    : int 1 18 1 2 13 14 15 16 17 18
 $ Period    : int 1 1 2 2 2 2 2 2 2 2
 $ Block    : Factor w/ 2 levels "p","s": 2 2 2 2 1 1 1 1 1 1
 $ State.at.STP : Factor w/ 2 levels "Gas","Solid": 1 1 2 2 2 2 1 1 1 1
 $ Occurrence  : Factor w/ 1 level "Primordial": 1 1 1 1 1 1 1 1 1 1
 $ Description : Factor w/ 6 levels "Alkali metal",..: 6 5 1 2 4 6 6 6 3 5

R imports the data into a data frame. As you can see, this example has ten observations of nine variables.

Notice that the default option is to convert character strings into factors. Thus, the columns Name, Block, State.At.STP, Occurrence, and Description all have been converted to factors. Also, notice that R converts spaces in the column names to periods (for example, in the column State.At.STP).

This default option of converting strings to factors when you use read.table() can be a source of great confusion. You’re often better off importing data that contains strings in such a way that the strings aren’t converted factors, but remain character vectors. To import data that contains strings, use the argument stringsAsFactors=FALSE to read.csv() or read.table():

> elements <- read.csv(file.path("f:", "elements.csv"), stringsAsFactors=FALSE)
> str(elements)
'data.frame': 10 obs. of 9 variables:
 $ Atomic.number: int 1 2 3 4 5 6 7 8 9 10
 $ Name     : chr "Hydrogen" "Helium" "Lithium" "Beryllium" ...
 $ Symbol    : chr "H" "He" "Li" "Be" ...
 $ Group    : int 1 18 1 2 13 14 15 16 17 18
 $ Period    : int 1 1 2 2 2 2 2 2 2 2
 $ Block    : chr "s" "s" "s" "s" ...
 $ State.at.STP : chr "Gas" "Gas" "Solid" "Solid" ...
 $ Occurrence  : chr "Primordial" "Primordial" "Primordial" "Primordial" ...
 $ Description : chr "Non-metal" "Noble gas" "Alkali metal" "Alkaline earth metal" ...

If you have a file in the EU (European Union) format (where commas are used as decimal separators and semicolons are used as field separators), you need to import it to R using the read.csv2() function.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win an iPad Mini. Enter to win now!