How to Create a Lattice Plot in R
To explore lattice graphics in R, first take a look at the built-in dataset mtcars. This dataset contains 32 observations of motor cars and information about the engine, such as number of cylinders, automatic versus manual gearbox, and engine power.
All the built-in datasets of R also have good help information that you can access through the Help mechanism — for example, by typing ?mtcars into the R console.
> str(mtcars) 'data.frame': 32 obs. of 11 variables: $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ... $ cyl : num 6 6 4 6 8 6 8 4 4 6 ... $ disp: num 160 160 108 258 360 ... $ hp : num 110 110 93 110 175 105 245 62 95 123 ... $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ... $ wt : num 2.62 2.88 2.32 3.21 3.44 ... $ qsec: num 16.5 17 18.6 19.4 17 ... $ vs : num 0 0 1 1 0 1 0 1 1 1 ... $ am : num 1 1 1 0 0 0 0 0 0 0 ... $ gear: num 4 4 4 3 3 3 3 4 4 4 ... $ carb: num 4 4 1 1 2 1 4 2 2 4 ..
Say you want to explore the relationship between fuel economy and engine power. The mtcars dataset has two elements with this information:
mpg: Fuel economy measured in miles per gallon (mpg)
hp: Engine power measured in horsepower (hp)
Although the lattice package forms part of the R distribution, you have to tell R that you plan to use the code in this package. You do this with the library() function. Remember that you need to do this at the start of each clean R session in which you want to use lattice:
Make a lattice scatterplot
The lattice package has a number of different functions to create different types of plot. For example, to create a scatterplot, use the xyplot() function. Notice that this is different from base graphics, where the plot() function creates a variety of different plot types (because of the method dispatch mechanism).
To make a lattice plot, you need to specify at least two arguments:
formula: This is a formula typically of the form y ~ x | z. It means to create a plot of y against x, conditional on z. In other words, create a plot for every unique value of z. Each of the variables in the formula has to be a column in the data frame that you specify in the data argument.
data: A data frame that contains all the columns that you specify in the formula argument.
This example should make it clear:
> xyplot(mpg ~ hp | factor(cyl), data=mtcars)
You can see that:
The variables mpg, hp, and cyl are columns in the data frame mtcars.
Although cyl is a numeric vector, the number of cylinders in a car can be only whole numbers (or discrete variables, in statistical jargon). By using factor(cyl) in your code, you tell R that cyl is, in fact, a discrete variable. If you forget to do this, R will still create a graphic, but the labels of the strips at the top of each panel will be displayed differently.
Because each of the cars in the data frame has four, six, or eight cylinders, the chart has three panes. You can see that the cars with larger engines tend to have more power (hp) and poorer fuel consumption (mpg).
Add trend lines
When you tell lattice to calculate a line of best fit, it does so for each panel in the plot. This is straightforward using xyplot(), because it’s as simple as adding a type argument. In particular, you want to specify that the type is both points (type="p") and regression (type="r"). You can combine different types with the c() function, like this:
> xyplot(mpg ~ hp | factor(cyl), data=mtcars, + type=c("p", "r"))