Online Test Banks
Score higher
See Online Test Banks
Learning anything is easy
Browse Online Courses
Mobile Apps
Learning on the go
Explore Mobile Apps
Dummies Store
Shop for books and more
Start Shopping

How to Define the Data Display Mode in R

A ggplot2 geom in R tells the plot how you want to display your data. For example, you use geom_bar() to make a bar chart. In ggplot2, you can use a variety of predefined geoms to make standard types of plot.

A geom defines the layout of a ggplot2 layer. For example, there are geoms to create bar charts, scatterplots, and line diagrams (as well as a variety of other plots).

Each geom has a default stat, and each stat has a default geom. In practice, you have to specify only one of these.

Geom Description Default Stat
geom_bar() Bar chart stat_bin()
geom_point() Scatterplot stat_identity()
geom_line() Line diagram, connecting observations in ordered by x-value stat_identity()
geom_boxplot Box-and-whisker plot stat_boxplot()
geom_path Line diagram, connecting observations in original order stat_identity()
geom_smooth Add a smoothed conditioned mean stat_smooth()
geom_histogram An alias for geom_bar() and stat_bin() stat_bin()

How to create a bar chart using ggplot2 in R

To make a bar chart you use the geom_bar() function. However, note that the default stat is stat_bin(), which is used to cut your data into bins. Thus, the default behavior of geom_bar() is to create a histogram.

For example, to create a histogram of the depth of earthquakes in the quakes dataset, you do the following:

> ggplot(quakes, aes(x=depth)) + geom_bar()
> ggplot(quakes, aes(x=depth)) + geom_bar(binwidth=50)

Notice that your mapping defines only the x-axis variable (in this case, quakes$depth). A useful argument to geom_bar() is binwidth, which controls the size of the bins that your data is cut into.

So, if geom_bar() makes a histogram by default, how do you make a bar chart? The answer is that you first have to aggregate your data, and then specify the argument stat="identity" in your call to geom_bar().

In the next example, you use aggregate() to calculate the number of quakes at different depth strata:

> quakes.agg <- aggregate(mag ~ round(depth, -1), data=quakes,
+         FUN=length)
> names(quakes.agg) <- c("depth", "mag")

Now you can plot the object quakes.agg with geom_bar(stat="identity"):

> ggplot(quakes.agg, aes(x=depth, y=mag)) +
+   geom_bar(stat="identity")

In summary, you can use geom_bar() to create a histogram and let ggplot2 summarize your data, or you can pre-summarize your data and then use stat="identity" to plot a bar chart.


How to make a scatterplot in ggplot2

To create a scatterplot, you use the geom_point() function. A scatterplot creates points (or sometimes bubbles or other symbols) on your chart. Each point corresponds to an observation in your data.

You’ve probably seen or created this type of graphic a million times, so you already know that scatterplots use the Cartesian coordinate system, where one variable is mapped to the x-axis and a second variable is mapped to the y-axis.

In exactly the same way, in ggplot2 you create a mapping between x-axis and y-axis variables. So, to create a plot of the quakes data, you map quakes$long to the x-axis and quakes$lat to the y-axis:

> ggplot(quakes, aes(x=long, y=lat)) + geom_point()

How to create ggplot2 line charts

To create a line chart, you use the geom_line() function. You use this function in a very similar way to geom_point(), with the difference that geom_line() draws a line between consecutive points in your data.

This type of chart is useful for time series data in data frames, such as the population data in the built-in dataset longley. To create a line chart of unemployment figures, you use the following:

> ggplot(longley, aes(x=Year, y=Unemployed)) + geom_line()
  • Add a Comment
  • Print
  • Share
blog comments powered by Disqus

Inside Sweepstakes

Win $500. Easy.