A graph starts with ggplot()
, which takes two arguments. The first argument is the source of the data. The second argument maps the data components of interest into components of the graph. The function that does the job is aes()
.
To begin a histogram for Price
in Cars93
, the function is
> ggplot(Cars93, aes(x=Price))
The aes()
function associates Price
with the x-axis. In ggplot-world, this is called an aesthetic mapping. In fact, each argument to aes()
is called an aesthetic.
This line of code draws the following figure, which is just a grid with a gray background and Price
on the x-axis.
Well, what about the y-axis? Does anything in the data map into it? No. That's because this is a histogram and nothing explicitly in the data provides a y-value for each x. So you can't say "y=" in aes()
. Instead, you let R do the work to calculate the heights of the bars in the histogram.
And what about that histogram? How do you put it into this blank grid? You have to add something indicating that you want to plot a histogram and let R take care of the rest. What you add is a geom
function ("geom" is short for "geometric object").
These geom
functions come in a variety of types. ggplot2 supplies one for almost every graphing need, and provides the flexibility to work with special cases. To draw a histogram, the geom
function to use is called geom_histogram()
.
How do you add geom_histogram()
to ggplot()
? With a plus sign:
ggplot(Cars93, aes(x=Price)) +
geom_histogram()
This produces the following figure. The grammar rules tell ggplot2 that when the geometric object is a histogram, R does the necessary calculations on the data and produces the appropriate plot.
At the bare minimum, ggplot2 graphics code has to have data, aesthetic mappings, and a geometric object. It's like answering a logical sequence of questions: What's the source of the data? What parts of the data are you interested in? Which parts of the data correspond to which parts of the graph? How do you want the graph to look?
Beyond those minimum requirements, you can modify the graph. Each bar is called a bin, and by default, ggplot()
uses 30 of them. After plotting the histogram, ggplot()
displays an onscreen message that advises experimenting with binwidth (which, unsurprisingly, specifies the width of each bin) to change the graph's appearance. Accordingly, you use binwidth = 5
as an argument in geom_histogram()
.
Additional arguments modify the way the bars look:
geom_histogram(binwidth=5, color = "black", fill = "white")
With another function, labs()
, you modify the labels for the axes and supply a title for the graph:
labs(x = "Price (x $1000)", y="Frequency",title="Prices of 93
Models of 1993 Cars")
Altogether now:
ggplot(Cars93, aes(x=Price)) +
geom_histogram(binwidth=5,color="black",fill="white") +
labs(x = "Price (x $1000)", y="Frequency", title="Prices of 93
Models of 1993 Cars")
The result is the following figure.