How to Work with Non-CSV Data Files in R - dummies

How to Work with Non-CSV Data Files in R

By Andrie de Vries, Joris Meys

Despite the fact that CSV (comma-separated values) files are very widely used to import and export data in R, they aren’t always the most appropriate format. Some data formats allow the specification of data that isn’t tabular in nature. Other data formats allow the description of the data using metadata (data that describes data).

The base distribution of R includes a package called foreign that contains functions to import data files from a number of commercial statistical packages, including SPSS, Stata, SAS, Octave, and Minitab.

To use these functions, you first have to load the foreign package:

> library(foreign)
> read.spss(file="location/of/myfile")
System Function to Import to R
SPSS read.spss
SAS read.xport or read.ssd
Stata read.dta
Minitab read.mtp

Read the Help documentation on these functions carefully. Because data frames in R may have a quite different structure than datasets in the statistical packages, you have to pay special attention to how value and variable labels are treated by these functions. Check also the treatment of special missing values.

These functions need a specific file format. The function read.xport() only works with the XPORT format of SAS. For read.mtp(), the file must be in the Minitab portable worksheet (.mtp) format.

Note that some of these functions are rather old. The newest versions of the statistical packages mentioned here may have different specifications for the format, so the functions aren’t always guaranteed to work.

Finally, note that some of these functions require the statistical package itself to be installed on your computer. The read.ssd() function, for example, can work only if you have SAS installed.

The bottom line: If you can transfer data using CSV files, you’ll save yourself a lot of trouble.

Finally, if you have a need to connect R to a database, then the odds are that a package exists that can connect to your database of choice. See the nearby sidebar, “Working with databases in R,” for some pointers.