An Introduction To ggplot2

Approaches to Graphing in R

Base Graphics

par(mfrow = c(3, 1))
with(subset(iris, Species == "setosa"), hist(Sepal.Length))
with(subset(iris, Species == "versicolor"), hist(Sepal.Length))
with(subset(iris, Species == "virginica"), hist(Sepal.Length))

Lattice

histogram(~ Sepal.Length | Species, data = iris, layout = c(1, 3))

ggplot2

ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_grid(Species ~ .)

Approaches to Graphing in R

Base Graphics and Lattice

Can only draw on top of the plot; cannot modify or delete existing content.

No user-accessible representation of graphics, apart from their appearance.

Functions are fast and powerful, but inflexible.

ggplot2

Conceptual rather than procedural approach to graphing.

More closely aligned with how viewers perceive graphs.

The Grammar of Graphics

From ggplot2: Elegant Graphics for Data Analysis, page 3:

[T]he grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a speciļ¬c coordinate system.

Components of the Grammar

Data and Mappings: Maps data to aesthetic attributes like statistical transformations, scales, and coordinate locations.

Geometric Objects: Objects that comprise a plot, like points, lines, polygons, and text.

Statistical Transformations: Optional data summaries like binning and models.

Scales: Map data values into the aesthetic space using color, size, or shape; use legends as inverse mappings.

Coordinate System: Describes how coordinates are mapped to the graphic plane.

Position Adjustments and Facetting: Describes how to represent data as subsets.

Learning By Example

Installation

install.packages("ggplot2")
library(ggplot2)

Documentation

help(package = "ggplot2")

Visit the ggplot2 homepage for over 500 examples of nearly 80 plotting functions.

An Aside About qplot()

qplot() is a high-level function for quick graphics and minimal configuration.

However, this presentation focuses on mastery over the entire grammar and so focuses on ggplot() rather than qplot().

mpg Data Set

Contains make, model, class, engine size, transmission and fuel economy for a selection of US cars in 1999 and 2008.

Includes popular cars like the Audi A4, Honda Civic, Hyundai Sonata, Nissan Maxima, Toyota Camry, and Volkswagen Jetta.

These data come from the EPA fuel economy website, http://fueleconomy.gov.

mpg Data Set

> head(mpg, 10)

    manufacturer       model  displ  year  cyl       trans  drv  cty  hwy  fl    class
1           audi          a4    1.8  1999    4    auto(l5)    f   18   29   p  compact
2           audi          a4    1.8  1999    4  manual(m5)    f   21   29   p  compact
3           audi          a4    2.0  2008    4  manual(m6)    f   20   31   p  compact
4           audi          a4    2.0  2008    4    auto(av)    f   21   30   p  compact
5           audi          a4    2.8  1999    6    auto(l5)    f   16   26   p  compact
6           audi          a4    2.8  1999    6  manual(m5)    f   18   26   p  compact
7           audi          a4    3.1  2008    6    auto(av)    f   18   27   p  compact
8           audi  a4 quattro    1.8  1999    4  manual(m5)    4   18   26   p  compact
9           audi  a4 quattro    1.8  1999    4    auto(l5)    4   16   25   p  compact
10          audi  a4 quattro    2.0  2008    4  manual(m6)    4   20   28   p  compact

Research Question

What is the relationship between engine size and fuel economy?

Building A Plot, Layer by Layer

The main function is ggplot(), which accepts two arguments: data and mapping.

  p <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))

The object p has mappings but lacks any layers, so we can't display it.

  > p
  Error: No layers in plot

Add a layer by applying a geom:

  p + geom_point()

You can also make grammatically correct but nonsensical plots:

  p + geom_line()

Geoms and Scales

ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point()
ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = factor(cyl)))
ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(shape = factor(cyl)))

Statistical Transformations

ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = factor(cyl))) +
  stat_smooth(method = "lm")
ggplot(data = mpg, aes(x = displ, y = hwy)) + 
  geom_point(aes(color = factor(cyl))) +
  stat_smooth(method = "lm", aes(color = factor(cyl)))

Facets and Position Adjustments

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + facet_grid(. ~ year)
ggplot(data = mpg, aes(x = factor(cyl), y = hwy)) + geom_boxplot()
ggplot(data = mpg, aes(x = factor(cyl), y = hwy)) + geom_boxplot() + geom_jitter()

/

#