How to Define the Data Display Mode in R
A ggplot2 geom in R tells the plot how you want to display your data. For example, you use geom_bar() to make a bar chart. In ggplot2, you can use a variety of predefined geoms to make standard types of plot.
A geom defines the layout of a ggplot2 layer. For example, there are geoms to create bar charts, scatterplots, and line diagrams (as well as a variety of other plots).
Each geom has a default stat, and each stat has a default geom. In practice, you have to specify only one of these.
|geom_line()||Line diagram, connecting observations in ordered by
|geom_path||Line diagram, connecting observations in original order||stat_identity()|
|geom_smooth||Add a smoothed conditioned mean||stat_smooth()|
|geom_histogram||An alias for geom_bar() and stat_bin()||stat_bin()|
How to create a bar chart using ggplot2 in R
To make a bar chart you use the geom_bar() function. However, note that the default stat is stat_bin(), which is used to cut your data into bins. Thus, the default behavior of geom_bar() is to create a histogram.
For example, to create a histogram of the depth of earthquakes in the quakes dataset, you do the following:
> ggplot(quakes, aes(x=depth)) + geom_bar() > ggplot(quakes, aes(x=depth)) + geom_bar(binwidth=50)
Notice that your mapping defines only the x-axis variable (in this case, quakes$depth). A useful argument to geom_bar() is binwidth, which controls the size of the bins that your data is cut into.
So, if geom_bar() makes a histogram by default, how do you make a bar chart? The answer is that you first have to aggregate your data, and then specify the argument stat=”identity” in your call to geom_bar().
In the next example, you use aggregate() to calculate the number of quakes at different depth strata:
> quakes.agg <- aggregate(mag ~ round(depth, -1), data=quakes, + FUN=length) > names(quakes.agg) <- c("depth", "mag")
Now you can plot the object quakes.agg with geom_bar(stat=”identity”):
> ggplot(quakes.agg, aes(x=depth, y=mag)) + + geom_bar(stat="identity")
In summary, you can use geom_bar() to create a histogram and let ggplot2 summarize your data, or you can pre-summarize your data and then use stat=”identity” to plot a bar chart.
How to make a scatterplot in ggplot2
To create a scatterplot, you use the geom_point() function. A scatterplot creates points (or sometimes bubbles or other symbols) on your chart. Each point corresponds to an observation in your data.
You’ve probably seen or created this type of graphic a million times, so you already know that scatterplots use the Cartesian coordinate system, where one variable is mapped to the x-axis and a second variable is mapped to the y-axis.
In exactly the same way, in ggplot2 you create a mapping between x-axis and y-axis variables. So, to create a plot of the quakes data, you map quakes$long to the x-axis and quakes$lat to the y-axis:
> ggplot(quakes, aes(x=long, y=lat)) + geom_point()
How to create ggplot2 line charts
To create a line chart, you use the geom_line() function. You use this function in a very similar way to geom_point(), with the difference that geom_line() draws a line between consecutive points in your data.
This type of chart is useful for time series data in data frames, such as the population data in the built-in dataset longley. To create a line chart of unemployment figures, you use the following:
> ggplot(longley, aes(x=Year, y=Unemployed)) + geom_line()