How to Evaluate Linear Data with R
Naturally, R provides a whole set of different tests and measures to evaluate how well your model fits your data as well as look at the model assumptions. Again, the overview presented here is far from complete, but it gives you an idea of what’s possible and a starting point to look deeper into the issue.
How to summarize the model
The summary() function immediately returns you the F test for models constructed with aov(). For lm() models, this is slightly different. Take a look at the output:
> Model.summary <- summary(Model) > Model.summary Call: lm(formula = mpg ~ wt, data = mtcars) Residuals: Min 1Q Median 3Q Max -4.5432 -2.3647 -0.1252 1.4096 6.8727 Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 37.2851 1.8776 19.858 < 2e-16 *** wt -5.3445 0.5591 -9.559 1.29e-10 *** --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 Residual standard error: 3.046 on 30 degrees of freedom Multiple R-squared: 0.7528, Adjusted R-squared: 0.7446 F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10
That’s a whole lot of useful information. Here you see the following:
The distribution of the residuals, which gives you a first idea about how well the assumptions of a linear model hold
The coefficients accompanied by a t-test, telling you in how far every coefficient differs significantly from zero
The goodness-of-fit measures R2 and the adjusted R2
The F-test that gives you an idea about whether your model explains a significant portion of the variance in your data
You can use the coef() function to extract a matrix with the estimates, standard errors, and t-value and p-value for the coefficients from the summary object like this:
> coef(Model.summary) Estimate Std. Error t value Pr(>|t|) (Intercept) 37.285126 1.877627 19.857575 8.241799e-19 wt -5.344472 0.559101 -9.559044 1.293959e-10
If these terms don’t tell you anything, look them up in a good source about modeling. For an extensive introduction to applying and interpreting linear models correctly, check out Applied Linear Statistical Models, 5th Edition, by Michael Kutner et al (McGraw-Hill/Irwin).
How to test the impact of model terms
To get an analysis of variance table — like the summary() function makes for an ANOVA model — you simply use the anova() function and pass it the lm() model object as an argument, like this:
> Model.anova <- anova(Model) > Model.anova Analysis of Variance Table Response: mpg Df Sum Sq Mean Sq F value Pr(>F) wt 1 847.73 847.73 91.375 1.294e-10 *** Residuals 30 278.32 9.28 --- Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here, the resulting object is a data frame that allows you to extract any value from that table using the subsetting and indexing tools. For example, to get the p-value, you can do the following:
> Model.anova['wt','Pr(>F)']  1.293959e-10
You can interpret this value as the probability that adding the variable wt to the model doesn’t make a difference. The low p-value here indicates that the weight of a car (wt) explains a significant portion of the difference in mileage (mpg) between cars. This shouldn’t come as a surprise; a heavier car does, indeed, need more power to drag its own weight around.
You can use the anova() function to compare different models as well, and many modeling packages provide that functionality. You find examples of this on most of the related Help pages like ?anova.lm and ?anova.glm.