In recent years a new graphics system for R has become quite popular, called ggplot2. It is built on and uses trellis graphics but "automates" many of the steps. Here are some examples of its use:
Let's begin with one of the most famous datasets in Statistics, Fisher's Iris data. Its has 4 measurements (Sepal.Length, Sepal.Width, Petal.Length and Petal.Width) of three kinds of iris flowers (setosa, versicolor, virginica). Let's say we want to draw a scatterplot of Sepal.Length vs Petal.Length, with the different irises identifies by different colors.
color=c(rep("red",50),rep("green",50),rep("blue",50))
plot(iris$Sepal.Length, iris$Petal.Length,col=color, pch=20,xlab="Sepal.Length",ylab="Petal.Lenght")
Now let's do this with ggplot2. Here all the graphs use the the same command, qplot ("Quick Plot") Actually there is also the ggplot command for more detailed control but it is more difficult and usually necessary.
qplot(Sepal.Length, Petal.Length, data = iris, color = Species)
Not only is this much shorter, it is also prettier! Note how all the things like the colors and the labels are taken care of automatically.
Now let's say we also want to include the variable Petal.Width in the graph. We will do this by varying the size of the plotting symbols, according to the values of Petal.Width:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width)
There is some overplotting here, especially in the virginica dots. Here is one way to fix that:
qplot(Sepal.Length, Petal.Length, data = iris, color = Species, size = Petal.Width, alpha = I(0.7))
As we said before, in ggplot2 all graphs use the same command, qplot. So how does the program know what graph we want? This is done with the geom argument. Say we want to do a barchart:
x=c(rep("A",10),rep("B",20),rep("C",12))
qplot(x,geom="bar")
or a histogram:
qplot(rnorm(1000),geom="histogram",binwidth=0.1)
or a boxplot
qplot(Species,Sepal.Length, data = iris, geom="boxplot")
We can also uses several geoms. Orange has the age and circumverence of different orange trees:
qplot(age, circumference, data = Orange, geom = c("point", "line"), colour = Tree)
We saw before the use of coplot, a graph that splits the data into parts and then does scatterplots. Recall the ethanol data:
coplot(NOx~E|C,data=ethanol)
In ggplot2 the same can be done using the facets argument:
qplot(E,NOx,data=ethanol,facets=~C)
Of course we also added a loess curve to the panels:
coplot(NOx~E|C,data=ethanol,panel=panel.smooth)
and again we can do this in ggplot2 as well. Here though we see a big differnce: ggplot2 thinks in terms of layers. So what we can do is first create the basic graph and save it:
A=qplot(E,NOx,data=ethanol,facets=~C)
Now we "add" the loess curves:
A+stat_smooth(method="loess")