How to Model Linear Data Relations with R - dummies

How to Model Linear Data Relations with R

By Andrie de Vries, Joris Meys

An analysis of variance for your data also can be written as a linear model in R, where you use a factor as a predictor variable to model a response variable.

Of course, predictor variables also can be continuous variables. For example, the weight of a car obviously has an influence on the mileage. But it would be nice to have an idea about the magnitude of that influence. Essentially, you want to find the equation that represents the trend line. You find the data you need for checking this in the dataset mtcars.


How to build a linear model

The lm() function allows you to specify anything from the most simple linear model to complex interaction models.

To model the mileage in function of the weight of a car, you use the lm() function, like this:

> Model <- lm(mpg ~ wt, data=mtcars)

You supply two arguments:

  • A formula that describes the model: Here, you model the variable mpg as a function of the variable wt.

  • A data frame that contains the variables in the formula: Here, you use the data frame mtcars.

You can specify many complex models with the formula interface when you know your way around.

The resulting object is a list with a very complex structure, but in most cases you don’t need to worry about that. The model object contains a lot of information that’s needed for the calculations of diagnostics and new predictions.

How to extract information from the model

Instead of diving into the model object itself and finding the information somewhere in the list object, you can use some functions that help you to get the necessary information from the model. For example, you can extract a named vector with the coefficients from the model using the coef() function, like this:

> coef.Model <- coef(Model)
> coef.Model
(Intercept)     wt
 37.285126  -5.344472

These coefficients represent the intercept and the slope of the trend line. You can use this to plot the trend line on a scatterplot of the data. You do this in two steps:

  1. You plot the scatterplot with the data.

    You use the plot() function for that.

  2. You use the abline() function to draw the trend line based on the coefficients.

The following code gives you the plot:

> plot(mpg ~ wt, data = mtcars)
> abline(a=coef.Model[1], b=coef.Model[2])

The abline() argument a represents the intercept, and b represents the slope of the trend line you want to plot. You plot a vertical line by setting the argument v to the intercept with the x-axis instead. Horizontal lines are plotted by setting the argument v to the intercept with the y-axis.

Below is an overview of functions to extract information from the model object itself. These functions work with different model objects, including those built by aov() and lm().

Many package authors also provide the same functions for the models built by the functions in their package. So, you can always try to use these extraction functions in combination with other model functions as well.

Function What It Does
coef() Returns a vector with the coefficients from the model
confint() Returns a matrix with the upper and lower limit of the
confidence interval for each coefficient of the model
fitted() Returns a vector with the fitted values for every
residuals() Returns a vector with the residuals for every observation
vcov() Returns the variance-covariance matrix for the coefficient