How to Use the Formula Interface in R - dummies

How to Use the Formula Interface in R

By Andrie de Vries, Joris Meys

Another very important idea in R is the formula interface. The formula interface allows you to concisely specify which columns to use when fitting a model, as well as the behavior of the model.

It’s important to keep in mind that the formula notation refers to statistical formulae, as opposed to mathematical formulae. So, for example, the formula operator + means to include a column, not to mathematically add two columns together. You need the operators when you start building models.

Be aware of the fact that the interpretation of the signs can differ depending on the modeling function you use.

Operator Example Meaning
~ y ~ x Model y as a function of x
+ y ~ a + b Include columns a as well as b
y ~ a b Include a but exclude b
: y ~ a : b Estimate the interaction of a and
* y ~ a * b Include columns as well as their interaction (that is, y ~ a + b + a:b)
| y ~ a | b Estimate y as a function of a conditional on b

Many R functions allow you to use the formula interface, often in addition to other ways of working with that function. For example, the aggregate() function also allows you to use formulae:

> aggregate(mpg ~ gear + am, data=cars, mean)
 gear    am   mpg
1  3 Automatic 16.10667
2  4 Automatic 21.05000
3  4  Manual 26.27500
4  5  Manual 21.38000

Notice that the first argument is a formula and the second argument is the source data frame. In this case, you tell aggregate to model mpg as a function of gear as well as am and to calculate the mean. This is the same example as in the previous paragraph, but by using the formula interface your function becomes very easy to read.

When you look at the Help file for a function, it’ll always be clear whether you can use a formula with that function. For example, take a look at the Help for ?aggregate. In the usage section of this page, you find the following text:

## S3 method for class 'data.frame'
aggregate(x, by, FUN, ..., simplify = TRUE)
## S3 method for class 'formula'
aggregate(formula, data, FUN, ...,
     subset, na.action = na.omit)

This page lists a method for class data.frame, as well as a method for class formula. This indicates that you can use either formulation.

You can find more (technical) information about formula on its own Help page, ?formula.