Online Test Banks
Score higher
See Online Test Banks
eLearning
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Evaluate Linear Data with R

Naturally, R provides a whole set of different tests and measures to evaluate how well your model fits your data as well as look at the model assumptions. Again, the overview presented here is far from complete, but it gives you an idea of what’s possible and a starting point to look deeper into the issue.

How to summarize the model

The summary() function immediately returns you the F test for models constructed with aov(). For lm() models, this is slightly different. Take a look at the output:

> Model.summary <- summary(Model)
> Model.summary
Call:
lm(formula = mpg ~ wt, data = mtcars)
Residuals:
  Min   1Q Median   3Q   Max
-4.5432 -2.3647 -0.1252 1.4096 6.8727
Coefficients:
      Estimate Std. Error t value Pr(>|t|)
(Intercept) 37.2851   1.8776 19.858 < 2e-16 ***
wt      -5.3445   0.5591 -9.559 1.29e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 3.046 on 30 degrees of freedom
Multiple R-squared: 0.7528,              Adjusted R-squared: 0.7446
F-statistic: 91.38 on 1 and 30 DF, p-value: 1.294e-10

That’s a whole lot of useful information. Here you see the following:

  • The distribution of the residuals, which gives you a first idea about how well the assumptions of a linear model hold

  • The coefficients accompanied by a t-test, telling you in how far every coefficient differs significantly from zero

  • The goodness-of-fit measures R2 and the adjusted R2

  • The F-test that gives you an idea about whether your model explains a significant portion of the variance in your data

You can use the coef() function to extract a matrix with the estimates, standard errors, and t-value and p-value for the coefficients from the summary object like this:

> coef(Model.summary)    
       Estimate Std. Error  t value   Pr(>|t|)
(Intercept) 37.285126  1.877627 19.857575 8.241799e-19
wt     -5.344472  0.559101 -9.559044 1.293959e-10

If these terms don’t tell you anything, look them up in a good source about modeling. For an extensive introduction to applying and interpreting linear models correctly, check out Applied Linear Statistical Models, 5th Edition, by Michael Kutner et al (McGraw-Hill/Irwin).

How to test the impact of model terms

To get an analysis of variance table — like the summary() function makes for an ANOVA model — you simply use the anova() function and pass it the lm() model object as an argument, like this:

> Model.anova <- anova(Model)
> Model.anova
Analysis of Variance Table
Response: mpg
     Df Sum Sq Mean Sq F value  Pr(>F)
wt     1 847.73 847.73 91.375 1.294e-10 ***
Residuals 30 278.32  9.28
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here, the resulting object is a data frame that allows you to extract any value from that table using the subsetting and indexing tools. For example, to get the p-value, you can do the following:

> Model.anova['wt','Pr(>F)']
[1] 1.293959e-10

You can interpret this value as the probability that adding the variable wt to the model doesn’t make a difference. The low p-value here indicates that the weight of a car (wt) explains a significant portion of the difference in mileage (mpg) between cars. This shouldn’t come as a surprise; a heavier car does, indeed, need more power to drag its own weight around.

You can use the anova() function to compare different models as well, and many modeling packages provide that functionality. You find examples of this on most of the related Help pages like ?anova.lm and ?anova.glm.

  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win $500. Easy.