# How to Define the Data Display Mode in R

A ggplot2 geom in R tells the plot how you want to display your data. For example, you use geom_bar() to make a bar chart. In ggplot2, you can use a variety of predefined geoms to make standard types of plot.

A geom defines the layout of a ggplot2 layer. For example, there are geoms to create bar charts, scatterplots, and line diagrams (as well as a variety of other plots).

Each geom has a default stat, and each stat has a default geom. In practice, you have to specify only one of these.

Geom | Description | Default Stat |
---|---|---|

geom_bar() | Bar chart | stat_bin() |

geom_point() | Scatterplot | stat_identity() |

geom_line() | Line diagram, connecting observations in ordered by
x-value |
stat_identity() |

geom_boxplot | Box-and-whisker plot | stat_boxplot() |

geom_path | Line diagram, connecting observations in original order | stat_identity() |

geom_smooth | Add a smoothed conditioned mean | stat_smooth() |

geom_histogram | An alias for geom_bar() and stat_bin() | stat_bin() |

## How to create a bar chart using ggplot2 in R

To make a bar chart you use the geom_bar() function. However, note that the default stat is stat_bin(), which is used to cut your data into bins. Thus, the default behavior of geom_bar() is to create a histogram.

For example, to create a histogram of the depth of earthquakes in the quakes dataset, you do the following:

> ggplot(quakes, aes(x=depth)) + geom_bar() > ggplot(quakes, aes(x=depth)) + geom_bar(binwidth=50)

Notice that your mapping defines only the *x*-axis variable (in this case, quakes$depth). A useful argument to geom_bar() is binwidth, which controls the size of the bins that your data is cut into.

So, if geom_bar() makes a histogram by default, how do you make a bar chart? The answer is that you first have to aggregate your data, and then specify the argument stat="identity" in your call to geom_bar().

In the next example, you use aggregate() to calculate the number of quakes at different depth strata:

> quakes.agg <- aggregate(mag ~ round(depth, -1), data=quakes, + FUN=length) > names(quakes.agg) <- c("depth", "mag")

Now you can plot the object quakes.agg with geom_bar(stat="identity"):

> ggplot(quakes.agg, aes(x=depth, y=mag)) + + geom_bar(stat="identity")

In summary, you can use geom_bar() to create a histogram and let ggplot2 summarize your data, or you can pre-summarize your data and then use stat="identity" to plot a bar chart.

## How to make a scatterplot in ggplot2

To create a scatterplot, you use the geom_point() function. A scatterplot creates points (or sometimes bubbles or other symbols) on your chart. Each point corresponds to an observation in your data.

You’ve probably seen or created this type of graphic a million times, so you already know that scatterplots use the Cartesian coordinate system, where one variable is mapped to the *x*-axis and a second variable is mapped to the *y*-axis.

In exactly the same way, in ggplot2 you create a mapping between *x*-axis and *y*-axis variables. So, to create a plot of the quakes data, you map quakes$long to the *x*-axis and quakes$lat to the *y*-axis:

> ggplot(quakes, aes(x=long, y=lat)) + geom_point()

## How to create ggplot2 line charts

To create a line chart, you use the geom_line() function. You use this function in a very similar way to geom_point(), with the difference that geom_line() draws a line between consecutive points in your data.

This type of chart is useful for time series data in data frames, such as the population data in the built-in dataset longley. To create a line chart of unemployment figures, you use the following:

> ggplot(longley, aes(x=Year, y=Unemployed)) + geom_line()