R Projects For Dummies
Book image
Explore Book Buy On Amazon
R formulas are useful for multiple reasons. Suppose you’re interested in how the temperature varies with the month. Having lived through many Mays through Septembers in one place, you might guess is that the temperature generally increases in this data frame from month to month. Is that the case?

This gets into the area of statistical analysis, and at a fairly esoteric level. So let’s take a look at an R capability — the formula.

In this example, let’s say that Temperature depends on Month. Another way to say this is that Temperature is the dependent variable and Month is the independent variable.

An R formula incorporates these concepts and serves as the basis for many of R’s statistical functions and graphing functions. This is the basic structure of an R formula:

function(dependent_var ~ independent_var, data = data.frame)

Read the tilde operator (~) as “depends on.”

Here’s how you can address the relationship between Temp and Month:

> analysis <- lm(Temp ~ Month, data=airquality)

The name of the function lm() is an abbreviation for linear model. This means that you expect the temperature to increase linearly (at a constant rate) from month to month. To see the results of the analysis, you can use summary():

analysis, you can use summary():
> summary(analysis)

Call: lm(formula = Temp ~ Month, data = airquality)

Residuals: Min 1Q Median 3Q Max -20.5263 -6.2752 0.9121 6.2865 17.9121

Coefficients: Estimate Std. Error t value Pr(>|t|) (Intercept) 58.2112 3.5191 16.541 < 2e-16 *** Month 2.8128 0.4933 5.703 6.03e-08 *** --- Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.614 on 151 degrees of freedom Multiple R-squared: 0.1772, Adjusted R-squared: 0.1717 F-statistic: 32.52 on 1 and 151 DF, p-value: 6.026e-08

The Estimate for Month indicates that temperature increases at a rate of 2.8128 degrees per month between May and September. Along with the Estimate for (Intercept), you can summarize the relationship between Temp and Month as

Temp=58.2112+(2.8128×Month)

where Month is a number from 5 to 9.

You might remember from algebra class that when you graph this kind of equation, you get a straight line — hence the term linear model. Is the linear model a good way to summarize these data? The numbers in the bottom line of the output say that it is, but I won’t go into the details.

The output of summary() (and other statistical functions in R) is a list. So if you want to refer to the Estimate for Month, that’s

> s <- summary(analysis)

> s$coefficients[2,1]

[1] 2.812789

About This Article

This article is from the book:

About the book author:

Joseph Schmuller, PhD, is a veteran of more than 25 years in Information Technology. He is the author of several books, including Statistical Analysis with R For Dummies and four editions of Statistical Analysis with Excel For Dummies. In addition, he has written numerous articles and created online coursework for Lynda.com.

This article can be found in the category: