R For Dummies (UK Edition)
R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics and practitioners. But in order to get the most out of R, you need to know how to access the R Help files and find help from other sources. To represent data in R, you need to be able to succinctly and correctly specify subsets of your data. Finally, R has many functions that allow you to import data from other applications.
Getting Help with R
Even with good introductory books on R, you’ll need to use the R Help files. The R Help files provide detailed information about the use of different functions and their peculiarities. R has excellent built-in help for every function that explains how to use that function. Just about every Help page has some examples that demonstrate how to use that function.
To search through the Help files, you’ll use one of the following functions:
?: Displays the Help file for a specific function. For example, ?data.frame displays the Help file for the data.frame() function.
??: Searches for a word (or pattern) in the Help files. For example, ??list returns the names of functions that contain the word list in either the function names or their descriptions.
RSiteSearch(): Performs an online search of RSiteSearch. This search engine allows you to perform a search of the R functions, package vignettes and the R-help mail archives. For example, RSiteSearch('linear models') does a search at this website for the search term linear models.
You aren’t limited to the R Help files if you’re looking for help with R. The add-on package sos, available for download from CRAN here, has some neat functions to search all the Help files on RSiteSearch. It displays results in a web browser window, making it easy to work with.
To use the package sos, you need to install the package by typing install.packages("sos") in your R console, and then load the package with library(sos).
Then you can use the findFn() function to do your search. For example, by typing findFn("regression") into your R console, you get a web page with the names, descriptions and links to several hundred functions that contain the word regression in the function name or Help text description.
Subsetting R Objects
Vectors, lists, and data frames play an important role in representing data in R, so being able to succinctly and correctly specify a subset of your data is important.
You can use three operators to subset your data:
$: Extracts a single element by name from a list or data frame. For example, iris$Sepal.Length extracts the column Sepal.Length from the data frame iris.
[[: Extracts a single element by name or position from a list or data frame. For example, iris[["Sepal.Length"]] extracts the column Sepal.Length from the data frame iris; iris[] extracts the second element from iris.
[: Extracts multiple elements from a vector, array, list, or data frame. For example, iris[, c("Sepal.Length", "Species")] extracts the columns Sepal.Length and Species from iris; iris[1:10, ] extracts the first ten rows from iris; and iris[1:10, "Species"] extracts the first ten elements of the column Species from iris.
The bracket subset operator, [, allows you to return multiple elements. You specify the subset in one of five ways:
Blank: Returns everything. For example, iris returns all of iris.
Positive numeral: Includes only these elements. For example, iris[1:100, 5] extracts the first hundred elements of the fifth column of iris.
Negative numeral: Excludes these elements. For example, iris[-(1:100), ] excludes the first hundred rows from iris, while iris[, -5] excludes the fifth column from iris.
Logical: Includes if TRUE; excludes if FALSE. For example, iris[iris$Species=="setosa", ] extracts only those rows from iris where the Species value is "setosa".
Name: Includes all names that match. For example, iris[, c("Species", "Petal.Width")] extracts the columns Species and Petal.Width from iris.
Importing Data into R
R has many functions that allow you to import data from other applications. The following table lists some of the useful text import functions, what they do, and examples of how to use them.
|Function||What It Does||Example|
|read.table()||Reads any tabular data where the columns are separated (for example by commas or tabs). You can specify the separator (for example, commas or tabs), as well as other arguments to precisely describe your data.||read.table(file="myfile", sep="\t", header=TRUE)|
|read.csv()||A simplified version of read.table with all the arguments preset to read CSV files, like Microsoft Excel spreadsheets.||read.csv(file="myfile")|
|read.csv2()||A version of read.csv() configured for data with a comma as the decimal point and a semicolon as the field separator.||read.csv2(file="myfile", header=TRUE)|
|read.delim()||Useful for reading delimited files, with tabs as the default separator.||read.delim(file="myfile", header=TRUE)|
|scan()||Allows you finer control over the read process when your data isn’t tabular.||scan("myfile", skip = 1, nmax=100)|
|readLines()||Reads text from a text file one line at a time.||readLines("myfile")|
|read.fwf||Read a file with dates in fixed-width format. In other words, each column in the data has a fixed number of characters.||read.fwf("myfile", widths=c(1,2,3)|
In addition to these options to read text data, the package foreign allows you to read data from other popular statistical formats, such as SPSS. To use these functions, you first have to load the built-in foreign package, with the following command:
The following table lists the functions to import data from SPSS, Stata, and SAS.
|Function||What It Does||Example|
|read.spss||Reads SPSS data file||read.spss("myfile")|
|read.dta||Reads Stata binary file||read.dta("myfile")|
|read.xport||Reads SAS export file||read.export("myfile")|