How to Analyze Data Variances in Models with R

By Andrie de Vries, Joris Meys

An analysis of variance (ANOVA) is a very common technique used with R to compare the means between different groups of data. To illustrate this, take a look at the dataset InsectSpray:

> str(InsectSprays)
'data.frame': 72 obs. of 2 variables:
 $ count: num 10 7 20 14 14 12 10 23 17 20 ...
 $ spray: Factor w/ 6 levels "A","B","C","D",..: 1 1 1 1 1 1 1 1 1 1 ...

This dataset contains the results of an agricultural experiment. Six insecticides were tested on 12 fields each, and the researchers counted the number of pesky bugs that remained on each field. Now the farmers need to know if the insecticides make any difference, and if so, which one they best use. You answer this question by using the aov() function to perform an ANOVA.

How to build the model

For this simple example, building the model is a piece of cake. You essentially want to model the means for the variable count as a function of the variable spray. You translate that to R like this:

> AOVModel <- aov(count ~ spray, data=InsectSprays)

You pass two arguments to the aov() function in this line of code:

  • The formula count ~ spray, which reads as “count as a function of spray”

  • The argument data, where you specify the data frame in which the variables in the formula can be found

Every modeling function returns a model object with a lot of information about the fitted model. Always put this model object in a variable. This way you don’t have to refit the model when you need to perform extra calculations.

How to look at the model object

As with every object, you can look at a model object just by typing its name in the console. If you do that for the object Model that you created, you see the following output:

> AOVModel
  aov(formula = count ~ spray, data=InsectSprays)
          spray Residuals
Sum of Squares 2668.833 1015.167
Deg. of Freedom    5    66
Residual standard error: 3.921902
Estimated effects may be unbalanced

This doesn’t tell you that much, apart from the command (or the call) you used to build the model and some basic information on the fitting result.

In the output, you also read that the estimated effects may be unbalanced. This isn’t a warning — it’s a message that’s built in by the author of the aov() function. This one can pop up in two situations:

  • You don’t have the same number of cases in every group.

  • You didn’t set orthogonal contrasts.

In this case, it’s the second reason.