Classification Methods

The two solutions we have discussed in the last section, linear and quadratic regression, are (slight variations of) what Fisher came up with back when he introduced the Iris data set. They are now called

Linear and Quadratic discriminants

and are implemented in R with

library(MASS)
df <- gen.ex(1)
fit <- lda(df$group~x+y, data=df)
df1 <- make.grid(df)
df1$group <- predict(fit, df1)$class
head(df1)

##           x         y group
## 1 -2.917448 -2.142699     A
## 2 -2.850408 -2.142699     A
## 3 -2.783368 -2.142699     A
## 4 -2.716327 -2.142699     A
## 5 -2.649287 -2.142699     A
## 6 -2.582247 -2.142699     A

do.graph(df, df1)

df <- gen.ex(3)
fit <- lda(group~x+y, data=df)
df1 <- make.grid(df)
df1$group <- predict(fit, df1)$class
do.graph(df, df1)

for example 2 we should use

df <- gen.ex(2)
fit <- qda(group~x+y, data=df)
df1 <- make.grid(df)
df1$group <- predict(fit, df1)$class
do.graph(df, df1)

Notice a couple of differences between the lm and lda/qda solutions:

in lda/qda we don’t have to do any coding, they accept categorical variables as response.
there is a difference between the lm and the lda/qda solutions of examples 2 and 3. Do you see what it is, and why?

Loess

Instead of using lm we could of course also have used loess:

df <- gen.ex(2)
fit <- loess(Code~x+y, data=df, 
             control = loess.control(surface = "direct"))
df1$group <- c(ifelse(predict(fit, df1)<0.5, "A", "B"))
do.graph(df, df1)

and in fact that looks quite a bit like the qda solution.

k-nearest neighbor

Here is an entirely different idea: we define a distance function, that is a function that calculates the distance between two points. In our examples we can just find Euclidean distance, but in other cases other distance functions can be useful. For a point x where we want to do prediction we find its k nearest neighbors and assign the label by majority rule. So if k=3 and if at least 2 of the three nearest neighbors are type “A”, then we assign type “A” to x.

library(class)
df1$group <-  factor(
     knn(df[, 1:2], df1[, 1:2], cl=df$group, k=1))
do.graph(df, df1)

df1$group <-  factor(
     knn(df[, 1:2], df1[, 1:2], cl=df$group, k=3))
do.graph(df, df1)

df1$group <-  factor(
     knn(df[, 1:2], df1[, 1:2], cl=df$group, k=11))
do.graph(df, df1)

clearly the choice of k determines the bias variance trade-off.

Here is the knn solution for the other two cases:

df <- gen.ex(1)
df1 <- make.grid(df)
df1$group <-  factor(
     knn(df[, 1:2], df1, cl=factor(df$group), k=5))
do.graph(df, df1)

df <- gen.ex(3)
df1 <- make.grid(df)
df1$group <-  factor(
     knn(df[, 1:2], df1, cl=factor(df$group), k=5))
do.graph(df, df1)

Our ability to apply this method clearly depends on how fast we can find the nearest neighbors. This issue has been studied extensively in Statistics and in Computer Science, and highly sophisticated algorithms are known that can handle millions of cases and hundreds of variables.