Like C++ and just about every other modern programming language R is object oriented. This is huge topic and we will only discuss the basic ideas. It is also only worth while (and in fact absolutely necessary) when writing large programs, at least several 100 lines.
We start with the following:
x <- rnorm(100)
# Expirical Distribution Function
a <- ecdf(x)
plot(a)
# Non parametric density estimate
a <- density(x)
plot(a, type="l")
so although we call the same command (plot) we get different graphs, one of the empirical distribution function and the other a non-parametric density estimate.
But how does R know what to do in either case? The reason is that each object has a class property:
a <- ecdf(x)
class(a)
## [1] "ecdf" "stepfun" "function"
a <- density(x)
class(a)
## [1] "density"
so when plot starts to run it examines the class property of the argument and does the corresponding graph.
There are many different plot functions (or methods)
methods(plot)
## [1] plot,ANY-method plot,color-method plot.acf*
## [4] plot.data.frame* plot.decomposed.ts* plot.default
## [7] plot.dendrogram* plot.density* plot.ecdf
## [10] plot.factor* plot.formula* plot.function
## [13] plot.ggplot* plot.gtable* plot.hcl_palettes*
## [16] plot.hclust* plot.histogram* plot.HoltWinters*
## [19] plot.isoreg* plot.lm* plot.medpolish*
## [22] plot.mlm* plot.ppr* plot.prcomp*
## [25] plot.princomp* plot.profile.nls* plot.R6*
## [28] plot.raster* plot.spec* plot.stepfun
## [31] plot.stl* plot.table* plot.ts
## [34] plot.tskernel* plot.TukeyHSD*
## see '?methods' for accessing help and source code
so when we call plot with an object of class density, it in turn calls the function plot.density.
R actually has three different ways to use object-oriented programming, called S3, S4 and RC. We won’t go into the details and which of them is more useful under what circumstances. In the following examples we use S3, which is the easiest to use but also usually sufficient.
Say we work for a store. At the end of each day we want to create a short report that
Say the data is in a data frame where the first column is the amount of a sale and the second column identifies the salesperson. So it might look like this:
Sales | Salesperson |
---|---|
16.69 | Jim |
19.60 | Ann |
33.16 | Ann |
27.82 | Jack |
76.19 | Jack |
Here is the non-object oriented solution:
report <- function(dta) {
salespersons <- unique(dta$Salesperson)
tbl <- matrix(0, length(salespersons), 3)
rownames(tbl) <- salespersons
colnames(tbl) <- c("Sales", "Mean", "SD")
for(i in seq_along(salespersons)) {
df <- dta[dta$Salesperson==salespersons[i], ]
tbl[i, 1] <- nrow(df)
tbl[i, 2] <- round(mean(df$Sales), 2)
tbl[i, 3] <- round(sd(df$Sales), 2)
}
boxplot(dta$Sales~dta$Salesperson,
horizontal=TRUE)
return(kable.nice(tbl))
}
report(sales.data)
Sales | Mean | SD | |
---|---|---|---|
Jim | 14 | 45.83 | 32.88 |
Ann | 12 | 34.68 | 18.16 |
Jack | 8 | 36.51 | 19.02 |
Mary | 6 | 51.68 | 28.12 |
Here is the object oriented one. First we have to define a new class:
as.sales <- function(x) {
class(x) <- c("sales", "data.frame")
return(x)
}
Next we have to define the methods:
stats <- function(x) UseMethod("stats")
stats.sales <- function(dta) {
salespersons <- unique(dta$Salesperson)
tbl <- matrix(0, length(salespersons), 3)
rownames(tbl) <- salespersons
colnames(tbl) <- c("Sales", "Mean", "SD")
for(i in seq_along(salespersons)) {
df <- dta[dta$Salesperson==salespersons[i], ]
tbl[i, 1] <- nrow(df)
tbl[i, 2] <- round(mean(df$Sales), 2)
tbl[i, 3] <- round(sd(df$Sales), 2)
}
return(kable.nice(tbl))
}
“plot” already exists, so we don’t need the UseMethod part:
plot.sales <- function(dta)
boxplot(dta$Sales~dta$Salesperson, horizontal=TRUE)
and now we can run
sales.data <- as.sales(sales.data)
# assign class sales to data frame
stats(sales.data)
Sales | Mean | SD | |
---|---|---|---|
Jim | 14 | 45.83 | 32.88 |
Ann | 12 | 34.68 | 18.16 |
Jack | 8 | 36.51 | 19.02 |
Mary | 6 | 51.68 | 28.12 |
plot(sales.data)
So far not much has been gained. But let’s say that sometimes we also have information on the whether the sales person was on the morning or on the afternoon shift, and we want to include this in our report. One great feature of object-oriented programming is inheritance, that is we can define a new class that already has all the features of the old one, plus whatever new one we want.
so say now the data is
Sales | Salesperson | Time |
---|---|---|
16.69 | Jim | Afternoon |
19.60 | Ann | Morning |
33.16 | Ann | Morning |
27.82 | Jack | Morning |
76.19 | Jack | Afternoon |
class(sales.time.data) <- c("salestime",
"sales",
"data.frame")
plot(sales.time.data)
and so we see that because sales.data is also of class sales plot still works. But we can also define its own plot method:
plot.salestime <- function(dta) {
par(mfrow=c(1,2))
Sales <- dta$Sales[dta$Time=="Morning"]
Salesperson <- dta$Salesperson[dta$Time=="Morning"]
boxplot(Sales~Salesperson, main="Morning")
Sales <- dta$Sales[dta$Time=="Afternoon"]
Salesperson <- dta$Salesperson[dta$Time=="Afternoon"]
boxplot(Sales~Salesperson, main="Afternoon")
}
plot(sales.time.data)
Note that we already used inheritance in the definition of the sales class:
class(x) <- c("sales", "data.frame")
so that the data remains as a data frame. If we had used
class(x) <- "sales"
sales.data would have been turned into a list.
generally every class has at least three methods:
Example
Let’s return to the empirical distribution function discussed earlier. It is defined as follows: Let \(x_1, .., x_n\) be a sample from some probability density \(f\). Then the empirical distribution function \(\hat F (x)\) is defined by
\[ \hat F (x) = \frac{\text{number of }x_i \le x}n \]
Let’s have a closer look at the ecdf object:
x <- sort(rnorm(10))
y <- ecdf(x)
class(y)
## [1] "ecdf" "stepfun" "function"
so the classes of an ecdf object are ecdf, stepfun and function. Let’s see what that means:
plot(y)
class(y) <- c("stepfun", "function")
plot(y)
class(y) <- "function"
plot(x, y(x))
Example
Let’s say we have a data frame with an x and a y column, which are the coordinates of some function. We want the plot command to graph that function:
x <- seq(0, 1, length=250)
y <- sin(4*pi*x)
df <- data.frame(x=x, y=y)
plot(df)
We can do this by defining a new class fn:
class(df) <- c("fn", "data.frame")
plot.fn <- function(df) {
plot(df[, 1], df[, 2], type="l",
xlab="x", ylab="y")
}
plot(df)
Exercise
Say for all the functions that cross the horizontal line y=0 we want to add that line to the graph. Use OOP!
plot(df)