How to Use Data in Tall Format in Lattice Plots in R

When you have data in tall format in R, you can easily use lattice graphics to visualize subgroups in your data. For instance, what happens when you want to analyze more than one variable simultaneously?

Consider the built-in dataset longley, containing data about employment, unemployment, and other population indicators:

> str(longley)
'data.frame': 16 obs. of 7 variables:
 $ GNP.deflator: num 83 88.5 88.2 89.5 96.2 ...
 $ GNP     : num 234 259 258 285 329 ...
 $ Unemployed : num 236 232 368 335 210 ...
 $ Armed.Forces: num 159 146 162 165 310 ...
 $ Population : num 108 109 110 111 112 ...
 $ Year    : int 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 ...
 $ Employed  : num 60.3 61.1 60.2 61.2 63.2 ...

One way to easily analyze the different variables of a data frame is to first reshape the data frame from wide format to tall format.

A wide data frame contains a column for each variable. A tall data frame contains all the same information, but the data is organized in such a way that one column is reserved for identifying the name of the variable and a second column contains the actual data.

An easy way to reshape a data frame from wide format to tall format is to use the melt() function in the reshape2 package. Remember: reshape2 is not part of base R — it’s an add-on package that is available on CRAN. You can install it with the install.packages("reshape2") function.

> library("reshape2")
> mlongley <- melt(longley, id.vars="Year")
> str(mlongley)
'data.frame': 96 obs. of 3 variables:
 $ Year  : int 1947 1948 1949 1950 1951 1952 1953 1954 1955 1956 ...
 $ variable: Factor w/ 6 levels "GNP.deflator",..: 1 1 1 1 1 1 1 1 1 1 ...
 $ value  : num 83 88.5 88.2 89.5 96.2 ...

Now you can plot the tall data frame mlongley and use the new columns value and variable in the formula value~Year | variable.

> xyplot(value ~ Year | variable, data=mlongley,
+   layout=c(6, 1),
+   par.strip.text=list(cex=0.7),
+   scales=list(cex=0.7)
+ )

The additional arguments par.strip.text and scales control the font size (character expansion ratio) of the strip at the top of the chart, as well as the scale.

When you create plots with multiple groups, make sure that the resulting plot is meaningful. For example, the unit of GNP (short for Gross National Product) is probably billions of dollars. In contrast the unit of population is probably millions of people. (The documentation of the longley dataset is not clear on this topic.)

Be very careful when you present plots like this — you don’t want to be accused of creating chart junk (misleading graphics).

image0.jpg
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus
Advertisement

Inside Dummies.com

Dummies.com Sweepstakes

Win $500. Easy.