By Andrie de Vries, Joris Meys

R is more than just a domain-specific programming language aimed at data analysis. It has some unique features that make it very powerful, the most important one arguably being the notion of vectors. These vectors allow you to perform sometimes complex operations on a set of values in a single command.

Performing multiple calculations with vectors

R is a vector-based language. You can think of a vector as a row or column of numbers or text. The list of numbers {1,2,3,4,5}, for example, could be a vector. Unlike most other programming languages, R allows you to apply functions to the whole vector in a single operation without the need for an explicit loop.

It is time to illustrate vectors with some real R code. First, assign the values 1:5 to a vector called x:

> x <- 1:5
> x
[1] 1 2 3 4 5

Next, add the value 2 to each element in the vector x:

> x + 2
[1] 3 4 5 6 7

You can also add one vector to another. To add the values 6:10 element-wise to x, you do the following:

> x + 6:10
[1] 7 9 11 13 15

To do this in most other programming language would require an explicit loop to run through each value of x. However, R is designed to perform many operations in a single step. This functionality is one of the features that make R so useful — and powerful — for data analysis.

Processing more than just statistics

R was developed by statisticians to make statistical data analysis easier. This heritage continues, making R a very powerful tool for performing virtually any statistical computation.

As R started to expand away from its origins in statistics, many people who would describe themselves as programmers rather than statisticians have become involved with R. The result is that R is now eminently suitable for a wide variety of nonstatistical tasks, including data processing, graphical visualization, and analysis of all sorts. R is being used in the fields of finance, natural language processing, genetics, biology, and market research, to name just a few.

R is Turing complete, which means that you can use R alone to program anything you want. (Not every task is easy to program in R, though.)

Running code without a compiler

R is an interpreted language, which means that — contrary to compiled languages like C and Java — you don’t need a compiler to first create a program from your code before you can use it. R interprets the code you provide directly and converts it into lower-level calls to pre-compiled code/functions.

In practice, it means that you simply write your code and send it to R, and the code runs, which makes the development cycle easy. This ease of development comes at the cost of speed of code execution, however. The downside of an interpreted language is that the code usually runs slower than the equivalent compiled code.

If you have experience in other languages, be aware that R is not C or Java. Although you can use R as a procedural language such as C or an object-oriented language such as Java, R is mostly based on the functional programming paradigm. This characteristic requires a bit of a different mindset. Forget what you know about other languages, and prepare for something completely different.