Object-Oriented Programming

Like C++ and just about every other modern programming language R is object oriented. This is huge topic and we will only discuss the basic ideas. It is also only worth while (and in fact absolutely necessary) when writing large programs, at least several 100 lines.

We start with the following:

x <- rnorm(100)
# Expirical Distribution Function
a <- ecdf(x)
plot(a)

# Non parametric density estimate
a <- density(x)
plot(a, type="l")

so although we call the same command (plot) we get different graphs, one of the empirical distribution function and the other a non-parametric density estimate.

But how does R know what to do in either case? The reason is that each object has a class property:

a <- ecdf(x)
class(a)

## [1] "ecdf"     "stepfun"  "function"

a <- density(x)
class(a)

## [1] "density"

so when plot starts to run it examines the class property of the argument and does the corresponding graph.

There are many different plot functions (or methods)

methods(plot)

##  [1] plot,ANY-method     plot,color-method   plot.acf*          
##  [4] plot.data.frame*    plot.decomposed.ts* plot.default       
##  [7] plot.dendrogram*    plot.density*       plot.ecdf          
## [10] plot.factor*        plot.formula*       plot.function      
## [13] plot.ggplot*        plot.gtable*        plot.hcl_palettes* 
## [16] plot.hclust*        plot.histogram*     plot.HoltWinters*  
## [19] plot.isoreg*        plot.lm*            plot.medpolish*    
## [22] plot.mlm*           plot.ppr*           plot.prcomp*       
## [25] plot.princomp*      plot.profile.nls*   plot.R6*           
## [28] plot.raster*        plot.spec*          plot.stepfun       
## [31] plot.stl*           plot.table*         plot.ts            
## [34] plot.tskernel*      plot.TukeyHSD*     
## see '?methods' for accessing help and source code

so when we call plot with an object of class density, it in turn calls the function plot.density.

R actually has three different ways to use object-oriented programming, called S3, S4 and RC. We won’t go into the details and which of them is more useful under what circumstances. In the following examples we use S3, which is the easiest to use but also usually sufficient.

Say we work for a store. At the end of each day we want to create a short report that

gives the number of sales, their mean and standard deviation for each salesperson.
does a boxplot of the sales, grouped by salesperson

Say the data is in a data frame where the first column is the amount of a sale and the second column identifies the salesperson. So it might look like this:

Sales	Salesperson
16.69	Jim
19.60	Ann
33.16	Ann
27.82	Jack
76.19	Jack

Here is the non-object oriented solution:

report <- function(dta) {
  salespersons <- unique(dta$Salesperson) 
  tbl <- matrix(0, length(salespersons), 3)
  rownames(tbl) <- salespersons
  colnames(tbl) <- c("Sales", "Mean", "SD")
  for(i in seq_along(salespersons)) {
      df <- dta[dta$Salesperson==salespersons[i], ]
      tbl[i, 1] <- nrow(df)
      tbl[i, 2] <- round(mean(df$Sales), 2)
      tbl[i, 3] <- round(sd(df$Sales), 2)
  }
  boxplot(dta$Sales~dta$Salesperson,
          horizontal=TRUE)
  return(kable.nice(tbl))
}
report(sales.data)

	Sales	Mean	SD
Jim	14	45.83	32.88
Ann	12	34.68	18.16
Jack	8	36.51	19.02
Mary	6	51.68	28.12

Here is the object oriented one. First we have to define a new class:

as.sales <- function(x) {
  class(x) <- c("sales", "data.frame")
  return(x)
}

Next we have to define the methods:

stats <- function(x) UseMethod("stats")
stats.sales <- function(dta) {
  salespersons <- unique(dta$Salesperson) 
  tbl <- matrix(0, length(salespersons), 3)
  rownames(tbl) <- salespersons
  colnames(tbl) <- c("Sales", "Mean", "SD")
  for(i in seq_along(salespersons)) {
      df <- dta[dta$Salesperson==salespersons[i], ]
      tbl[i, 1] <- nrow(df)
      tbl[i, 2] <- round(mean(df$Sales), 2)
      tbl[i, 3] <- round(sd(df$Sales), 2)
  }
  return(kable.nice(tbl))
}

“plot” already exists, so we don’t need the UseMethod part:

plot.sales <- function(dta) 
  boxplot(dta$Sales~dta$Salesperson, horizontal=TRUE)

and now we can run

sales.data <- as.sales(sales.data)
# assign class sales to data frame
stats(sales.data)

	Sales	Mean	SD
Jim	14	45.83	32.88
Ann	12	34.68	18.16
Jack	8	36.51	19.02
Mary	6	51.68	28.12

plot(sales.data)

So far not much has been gained. But let’s say that sometimes we also have information on the whether the sales person was on the morning or on the afternoon shift, and we want to include this in our report. One great feature of object-oriented programming is inheritance, that is we can define a new class that already has all the features of the old one, plus whatever new one we want.

so say now the data is

Sales	Salesperson	Time
16.69	Jim	Afternoon
19.60	Ann	Morning
33.16	Ann	Morning
27.82	Jack	Morning
76.19	Jack	Afternoon

class(sales.time.data) <- c("salestime", 
                            "sales",
                            "data.frame")
plot(sales.time.data)

and so we see that because sales.data is also of class sales plot still works. But we can also define its own plot method:

plot.salestime <- function(dta) {
  par(mfrow=c(1,2))
  Sales <- dta$Sales[dta$Time=="Morning"]
  Salesperson <- dta$Salesperson[dta$Time=="Morning"]
  boxplot(Sales~Salesperson, main="Morning")
  Sales <- dta$Sales[dta$Time=="Afternoon"]
  Salesperson <- dta$Salesperson[dta$Time=="Afternoon"]
  boxplot(Sales~Salesperson, main="Afternoon")
  
}
plot(sales.time.data)

Note that we already used inheritance in the definition of the sales class:

class(x) <- c("sales", "data.frame")

so that the data remains as a data frame. If we had used

class(x) <- "sales"

sales.data would have been turned into a list.

generally every class has at least three methods:

print
summary (stats)
plot

Example

Let’s return to the empirical distribution function discussed earlier. It is defined as follows: Let \(x_1, .., x_n\) be a sample from some probability density \(f\). Then the empirical distribution function \(\hat F (x)\) is defined by

\[ \hat F (x) = \frac{\text{number of }x_i \le x}n \]

Let’s have a closer look at the ecdf object:

x <- sort(rnorm(10))
y <- ecdf(x)
class(y)

## [1] "ecdf"     "stepfun"  "function"

so the classes of an ecdf object are ecdf, stepfun and function. Let’s see what that means:

plot(y)

class(y) <- c("stepfun", "function")
plot(y)

class(y) <- "function"
plot(x, y(x))

Example

Let’s say we have a data frame with an x and a y column, which are the coordinates of some function. We want the plot command to graph that function:

x <- seq(0, 1, length=250)
y <- sin(4*pi*x)
df <- data.frame(x=x, y=y)
plot(df)

We can do this by defining a new class fn:

class(df) <- c("fn", "data.frame")
plot.fn <- function(df) {
  plot(df[, 1], df[, 2], type="l", 
       xlab="x", ylab="y")
}
plot(df)

Exercise

Say for all the functions that cross the horizontal line y=0 we want to add that line to the graph. Use OOP!

plot(df)