Andrie de Vries

Andrie de Vries is a leading R expert and Business Services Director for Revolution Analytics. With over 20 years of experience, he provides consulting and training services in the use of R. Joris Meys is a statistician, R programmer and R lecturer with the faculty of Bio-Engineering at the University of Ghent.

Articles From Andrie de Vries

page 1
page 2
page 3
page 4
page 5
41 results
41 results
R For Dummies Cheat Sheet

Cheat Sheet / Updated 07-29-2022

R is more than just a statistical programming language. It’s also a powerful tool for all kinds of data processing and manipulation, used by a community of programmers and users, academics, and practitioners. To get the most out of R, you need to know how to access the R Help files and find help from other sources. To represent data in R, you need to be able to succinctly and correctly specify subsets of your data. Finally, R has many functions that allow you to import data from other applications.

View Cheat Sheet
How to Name Matrix Rows and Columns in R programming

Article / Updated 11-04-2021

The rbind() function in the R programming language conveniently adds the names of the vectors to the rows of the matrix. You name the values in a vector, and you can do something very similar with rows and columns in a matrix. For that, you have the functions rownames() and colnames(). Guess which one does what? Both functions work much like the names() function you use when naming vector values. Changing the row and column names The matrix baskets.team already has some row names. It would be better if the names of the rows would just read “Granny” and “Geraldine”. You can easily change these row names like this: > rownames(baskets.team) <- c(“Granny”, “Geraldine”) You can look at the matrix to check if this did what it’s supposed to do, or you can take a look at the row names itself like this: > rownames(baskets.team) [1] “Granny” “Geraldine” The colnames() function works exactly the same. You can, for example, add the number of the game as a column name using the following code: > colnames(baskets.team) <- c(“1st”, “2nd”, “3th”, “4th”, “5th”, “6th”) This gives you the following matrix: > baskets.team 1st 2nd 3th 4th 5th 6th Granny 12 4 5 6 9 3 Geraldine 5 4 2 4 12 9 This is almost like you want it, but the third column name contains an annoying writing mistake. No problem there; R allows you to easily correct that mistake. Just as the with names() function, you can use indices to extract or to change a specific row or column name. You can correct the mistake in the column names like this: > colnames(baskets.team)[3] <- “3rd” If you want to get rid of either column names or row names, the only thing you need to do is set their value to NULL. This also works for vector names, by the way. You can try that out yourself on a copy of the matrix baskets.team like this: > baskets.copy <- baskets.team > colnames(baskets.copy) <- NULL > baskets.copy [,1] [,2] [,3] [,4] [,5] [,6] Granny 12 4 5 6 9 3 Geraldine 5 4 2 4 12 9 R stores the row and column names in an attribute called dimnames. Use the dimnames() function to extract or set those values. Using names as indices These row and column names can be used just like you use names for values in a vector. You can use these names instead of the index number to select values from a vector. This works for matrices as well, using the row and column names. Say you want to select the second and the fifth game for both ladies; try: > baskets.team[, c(“2nd”, “5th”)] 2nd 5th Granny 4 9 Geraldine 4 12 Exactly as before, you get all rows if you don’t specify which ones you want. Alternatively, you can extract all the results for Granny like this: > baskets.team[“Granny”, ] 1st 2nd 3rd 4th 5th 6th 12 4 5 6 9 3 That’s the result, indeed, but the row name is gone now. R tries to simplify the matrix to a vector, if that’s possible. In this case, a single row is returned so, by default, this result is transformed to a vector. If a one-row matrix is simplified to a vector, the column names are used as names for the values. If a one-column matrix is simplified to a vector, the row names are used as names for the vector. If you want to keep all names, you must set the argument drop to FALSE to avoid conversion to a vector.

View Article
How to Create a Data Frame from Scratch in R

Article / Updated 10-28-2021

In the R programming language, a conversion from a matrix to a data frame can’t be used to construct a data frame with different types of values. If you combine both numeric and character data in a matrix, for example, everything will be converted to character. You can construct a data frame from scratch, though, using the data.frame() function. Once a data frame is created, you can add observations to a data frame. Make a data frame from vectors in R So, let’s make a little data frame with the names, salaries, and starting dates of a few imaginary co-workers. First, you create three vectors that contain the necessary information like this: > employee <- c('John Doe','Peter Gynn','Jolie Hope') > salary <- c(21000, 23400, 26800) > startdate <- as.Date(c('2010-11-1','2008-3-25','2007-3-14')) Now you have three different vectors in your workspace: A character vector called employee, containing the names A numeric vector called salary, containing the yearly salaries A date vector called startdate, containing the dates on which the co-workers started Next, you combine the three vectors into a data frame using the following code: > employ.data <- data.frame(employee, salary, startdate) The result of this is a data frame, employ.data, with the following structure: > str(employ.data) 'data.frame': 3 obs. of 3 variables: $ employee : Factor w/ 3 levels "John Doe","Jolie Hope",..: 1 3 2 $ salary : num 21000 23400 26800 $ startdate: Date, format: "2010-11-01" "2008-03-25" ... To combine a number of vectors into a data frame, you simply add all vectors as arguments to the data.frame() function, separated by commas. R will create a data frame with the variables that are named the same as the vectors used. Keep characters as characters in R You may have noticed something odd when looking at the structure of employ.data. Whereas the vector employee is a character vector, R made the variable employee in the data frame a factor. R does this by default, but you have an extra argument to the data.frame() function that can avoid this — namely, the argument stringsAsFactors. In the employ.data example, you can prevent the transformation to a factor of the employee variable by using the following code: > employ.data <- data.frame(employee, salary, startdate, stringsAsFactors=FALSE) If you look at the structure of the data frame now, you see that the variable employee is a character vector, as shown in the following output: > str(employ.data) 'data.frame': 3 obs. of 3 variables: $ employee : chr "John Doe" "Peter Gynn" "Jolie Hope" $ salary : num 21000 23400 26800 $ startdate: Date, format: "2010-11-01" "2008-03-25" ... By default, R always transforms character vectors to factors when creating a data frame with character vectors or converting a character matrix to a data frame. This can be a nasty cause of errors in your code if you’re not aware of it. If you make it a habit to always specify the stringsAsFactors argument, you can avoid a lot of frustration.

View Article
Importing Data into R

Article / Updated 03-26-2016

R has many functions that allow you to import data from other applications. The following table lists some of the useful text import functions, what they do, and examples of how to use them. Function What It Does Example read.table() Reads any tabular data where the columns are separated (for example by commas or tabs). You can specify the separator (for example, commas or tabs), as well as other arguments to precisely describe your data. read.table(file="myfile", sep="t", header=TRUE) read.csv() A simplified version of read.table() with all the arguments preset to read CSV files, like Microsoft Excel spreadsheets. read.csv(file="myfile") read.csv2() A version of read.csv() configured for data with a comma as the decimal point and a semicolon as the field separator. read.csv2(file="myfile", header=TRUE) read.delim() Useful for reading delimited files, with tabs as the default separator. read.delim(file="myfile", header=TRUE) scan() Allows you finer control over the read process when your data isn’t tabular. scan("myfile", skip = 1, nmax=100) readLines() Reads text from a text file one line at a time. readLines("myfile") read.fwf Read a file with dates in fixed-width format. In other words, each column in the data has a fixed number of characters. read.fwf("myfile", widths=c(1,2,3) In addition to these options to read text data, the package foreign allows you to read data from other popular statistical formats, such as SPSS. To use these functions, you first have to load the built-in foreign package, with the following command: > library("foreign") The following table lists the functions to import data from SPSS, Stata, and SAS. Function What It Does Example read.spss Reads SPSS data file read.spss("myfile") read.dta Reads Stata binary file read.dta("myfile") read.xport Reads SAS export file read.export("myfile")

View Article
10 Online Resources for R Programming

Article / Updated 03-26-2016

In addition to R For Dummies, there are many online resources for the aspiring or experienced R programmer. They'll help you find answers or learn valuable tips and tricks. (Or, they're just fun!) Cookbook for R The goal of the Cookbook for R is to provide solutions to common tasks and problems in analyzing data. Mailing lists Specialized mailing lists can help you find answers for your field of interest. R podcast Enjoy some R talk while you drive to work with the R podcast! R Seek R Seek is a simple search tool that accesses several online resources with one command. The R Inferno Patrick Burns has created a tutorial, The R Inferno, in the style of Dante's Inferno: "Abandon all hope, ye who enter here!" The pdf file can be found here: http://www.burns-stat.com/pages/Tutor/R_inferno.pdf. The R Journal The R Journal is the open access, refereed journal of the R project for statistical computing. R-bloggers Hundreds of blogs on R are linked and searched on this the R-bloggers web page. RTips In RTips, dozens of tweaks, tips, and timesavers are lovingly compiled from years of experience on the R mailing list. Twitter R is a very short name, so that leaves 139 characters on Twitter for the latest nugget (or link to cat videos). Revolutions The Revolutions blog captures the latest trends in R programming. It's sponsored by Revolution Analytics.

View Article
Subsetting R Objects

Article / Updated 03-26-2016

Vectors, lists, and data frames play an important role in representing data in R, so being able to succinctly and correctly specify a subset of your data is important. There are three main operators that you can use to subset your data: $: Extracts a single element by name from a list or data frame. For example, iris$Sepal.Length extracts the column Sepal.Length from the data frame iris. [[: Extracts a single element by name or position from a list or data frame. For example, iris[["Sepal.Length"]] extracts the column Sepal.Length from the data frame; iris; iris[[2]] extracts the second element from iris. [: Extracts multiple elements from a vector, array, list, or data frame. For example, iris[, c("Sepal.Length", "Species")] extracts the columns Sepal.Length and Species from iris; iris[1:10, ] extracts the first ten rows from iris; and iris[1:10, "Species"] extracts the first ten elements of the column Species from iris. The bracket subset operator, [, allows you to return multiple elements. You specify the subset in one of five ways: Blank: Returns everything. For example, iris[] returns all of iris. Positive numeral: Includes only these elements. For example, iris[1:100, 5] extracts the first hundred elements of the fifth column of iris. Negative numeral: Excludes these elements. For example, iris[-(1:100), ] excludes the first hundred rows from iris, while iris[, -5] excludes the fifth column from iris. Logical: Includes if TRUE; excludes if FALSE. For example, iris[iris$Species=="setosa", ] extracts only those rows from iris where the Species value is "setosa". Name: Includes all names that match. For example, iris[, c("Species", "Petal.Width")] extracts the columns Species and Petal.Width from iris.

View Article
Getting Help with R

Article / Updated 03-26-2016

Even with good introductory books on R, you'll need to use the R Help files. The R Help files provide detailed information about the use of different functions and their peculiarities. R has excellent built-in help for every function that explains how to use that function. Just about every Help page has some examples that demonstrate how to use that function. To search through the Help files, you'll use one of the following functions: ?: Displays the Help file for a specific function. For example, ?data.frame displays the Help file for the data.frame() function. ??: Searches for a word (or pattern) in the Help files. For example, ??list returns the names of functions that contain the word list in either the function names or their descriptions. RSiteSearch(): Performs an online search of RSiteSearch. This search engine allows you to perform a search of the R functions, package vignettes and the R-help mail archives. For example, RSiteSearch("linear models") does a search at this website for the search term "linear models." You aren't limited to the R Help files if you're looking for help with R. The add-on package sos, available for download from CRAN here, has some neat functions to search all the Help files on RSiteSearch. It displays results in a web browser window, making it easy to work with. To use the package sos, you need to install the package by typing install.packages("sos") in your R console, and then load the package with library("sos"). Then you can use the findFn() function to do your search. For example, by typing findFn("regression") into your R console, you get a web page with the names, descriptions and links to several hundred functions that contain the word regression in the function name or Help text description.

View Article
How to Combine Vectors in R

Article / Updated 03-26-2016

To dive a bit deeper into how you can use vectors in R, let’s consider this All-Star Grannies example. You have two vectors that contain the number of baskets that Granny and her friend Geraldine scored in the six games of this basketball season: > baskets.of.Granny <- c(12, 4, 4, 6, 9, 3) > baskets.of.Geraldine <- c(5, 3, 2, 2, 12, 9) The c() function stands for combine. It doesn’t create vectors — it just combines them. You give six values as arguments to the c() function and get one combined vector in return. As you know, R considers each value a vector with one element. You also can use the c() function to combine vectors with more than one value, as in the following example: > all.baskets <-c(baskets.of.Granny, baskets.of.Geraldine) > all.baskets [1] 12 4 4 6 9 3 5 3 2 2 12 9 The result of this code is a vector with all 12 values. In this code, the c() function maintains the order of the numbers. This example illustrates a second important feature of vectors: Vectors have an order. This order turns out to be very useful when you need to manipulate the individual values in the vector.

View Article
How to Search Text by Pattern in R

Article / Updated 03-26-2016

Like any programming language, R makes it easy to compile lists of sorted and ordered data. To find substrings, you can use the grep() function, which takes two essential arguments: pattern: The pattern you want to find. x: The character vector you want to search. Suppose you want to find all the states that contain the pattern New. Do it like this: > grep(“New”, state.name) [1] 29 30 31 32 The result of grep() is a numeric vector with the positions of each of the components that contain the matching pattern. In other words, the 29th component of state.name contains the word New. > state.name[29] New Hampshire Phew, that worked! But typing in the position of each matching text is going to be a lot of work. Fortunately, you can use the results of grep() directly to subset the original vector. You can do this by adding the argument value = TRUE. Try this: > grep(“New”, state.name, value = TRUE) [1] “New Hampshire” “New Jersey” [3] “New Mexico” “New York” The grep() function is case sensitive — it only matches text in the same case (uppercase or lowercase) as your search pattern. If you search for the pattern “new” in lowercase, your search results are empty: > grep(“new”, state.name, value = TRUE) character(0)

View Article
Tips for More Date and Time Functionality in R

Article / Updated 03-26-2016

Once you have a really good grip on using date and time, you may want to explore additional functionality available in R and add-on packages by looking at the following: chron: R has the simpler chron class for datetime objects that don’t have a time zone. To investigate this class, first load the chron package with library(“chron”) and then read the Help file ?chron. lubridate: You can download the add-on package lubridate from CRAN. This package provides many functions to make it easier to work with dates. You can download and find more information at CRAN. R also has very good support for objects that represent time series data. Time series data usually refers to information that was recorded at fixed intervals, such as days, months, or years: ts: In R, you use the ts() function to create time series objects. These are vector or matrix objects that contain information about the observations, together with information about the start, frequency, and end of each observation period. With ts class data you can use powerful R functions to do modeling and forecasting — for example, arima() is a general model for time series data. zoo and xts: The add-on package zoo extends time series objects by allowing observations that don’t have such strictly fixed intervals. You can download it from CRAN. The add-on package xts provides additional extensions to time series data and builds on the functionality of ts as well as zoo objects. You can also download xts from CRAN.

View Article
page 1
page 2
page 3
page 4
page 5