par(mfrow = c(3, 1)) with(subset(iris, Species == "setosa"), hist(Sepal.Length)) with(subset(iris, Species == "versicolor"), hist(Sepal.Length)) with(subset(iris, Species == "virginica"), hist(Sepal.Length))

histogram(~ Sepal.Length | Species, data = iris, layout = c(1, 3))

ggplot(iris, aes(Sepal.Length)) + geom_histogram() + facet_grid(Species ~ .)

Can only draw on top of the plot; cannot modify or delete existing content.

No user-accessible representation of graphics, apart from their appearance.

Functions are fast and powerful, but inflexible.

Conceptual rather than procedural approach to graphing.

More closely aligned with how viewers perceive graphs.

From *ggplot2: Elegant Graphics for Data Analysis*, page 3:

[T]he grammar tells us that a statistical graphic is a mapping from data to aesthetic attributes (colour, shape, size) of geometric objects (points, lines, bars). The plot may also contain statistical transformations of the data and is drawn on a speciļ¬c coordinate system.

Data and Mappings: Maps data to aesthetic attributes like statistical transformations, scales, and coordinate locations.

Geometric Objects: Objects that comprise a plot, like points, lines, polygons, and text.

Statistical Transformations: Optional data summaries like binning and models.

Scales: Map data values into the aesthetic space using color, size, or shape; use legends as inverse mappings.

Coordinate System: Describes how coordinates are mapped to the graphic plane.

Position Adjustments and Facetting: Describes how to represent data as subsets.

install.packages("ggplot2") library(ggplot2)

help(package = "ggplot2")

Visit the ggplot2 homepage for over 500 examples of nearly 80 plotting functions.

qplot() is a high-level function for quick graphics and minimal configuration.

However, this presentation focuses on mastery over the entire grammar and so focuses on ggplot() rather than qplot().

Contains make, model, class, engine size, transmission and fuel economy for a selection of US cars in 1999 and 2008.

Includes popular cars like the Audi A4, Honda Civic, Hyundai Sonata, Nissan Maxima, Toyota Camry, and Volkswagen Jetta.

These data come from the EPA fuel economy website, http://fueleconomy.gov.

> head(mpg, 10) manufacturer model displ year cyl trans drv cty hwy fl class 1 audi a4 1.8 1999 4 auto(l5) f 18 29 p compact 2 audi a4 1.8 1999 4 manual(m5) f 21 29 p compact 3 audi a4 2.0 2008 4 manual(m6) f 20 31 p compact 4 audi a4 2.0 2008 4 auto(av) f 21 30 p compact 5 audi a4 2.8 1999 6 auto(l5) f 16 26 p compact 6 audi a4 2.8 1999 6 manual(m5) f 18 26 p compact 7 audi a4 3.1 2008 6 auto(av) f 18 27 p compact 8 audi a4 quattro 1.8 1999 4 manual(m5) 4 18 26 p compact 9 audi a4 quattro 1.8 1999 4 auto(l5) 4 16 25 p compact 10 audi a4 quattro 2.0 2008 4 manual(m6) 4 20 28 p compact

What is the relationship between engine size and fuel economy?

The main function is ggplot(), which accepts two arguments: data and mapping.

p <- ggplot(data = mpg, mapping = aes(x = displ, y = hwy))

The object p has mappings but lacks any layers, so we can't display it.

> p Error: No layers in plot

Add a layer by applying a geom:

p + geom_point()

You can also make grammatically correct but nonsensical plots:

p + geom_line()

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point()

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = factor(cyl)))

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point(aes(shape = factor(cyl)))

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = factor(cyl))) + stat_smooth(method = "lm")

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point(aes(color = factor(cyl))) + stat_smooth(method = "lm", aes(color = factor(cyl)))

ggplot(data = mpg, aes(x = displ, y = hwy)) + geom_point() + facet_grid(. ~ year)

ggplot(data = mpg, aes(x = factor(cyl), y = hwy)) + geom_boxplot()

ggplot(data = mpg, aes(x = factor(cyl), y = hwy)) + geom_boxplot() + geom_jitter()

/

#