A neural network is a computer algorithm for fitting that is mostly a black box, that is understanding what it exactly does is not easy and beyond our level. For an explanation see Artifical Neural Networks.
library(nnet)
df <- gen.ex(1)[ ,1:3]
fit <- nnet(factor(group)~., data=df, size=2)
## # weights: 9
## initial value 92.369994
## iter 10 value 17.586865
## iter 20 value 11.761042
## iter 30 value 10.162683
## iter 40 value 9.131793
## iter 50 value 9.129375
## final value 9.129343
## converged
notice the use of class.ind because nnet requires a special format for the response variable.
to visualize a network we can use
library(NeuralNetTools)
par(mar=c(1, 1, 0, 1))
plotnet(fit)
so the network has 2 input layers I1-I4, one for each predictor. It has two hidden layers because we chose size=2. It has two layers for backward propagation and finally it has two output layers.
df1 <- make.grid(df)
df1$group <- predict(fit, df1, type="class")
do.graph(df, df1)
df <- gen.ex(2)[, 1:3]
fit <- nnet(factor(group)~., data=df, size=2)
## # weights: 9
## initial value 71.031288
## iter 10 value 33.049808
## iter 20 value 12.118664
## iter 30 value 10.779535
## iter 40 value 10.633181
## iter 50 value 10.620438
## iter 60 value 10.567945
## iter 70 value 10.551531
## iter 80 value 10.546393
## iter 90 value 10.544883
## iter 100 value 10.540798
## final value 10.540798
## stopped after 100 iterations
df1 <- make.grid(df)
df1$group <- predict(fit, df1, type="class", trace=0)
do.graph(df, df1)
df <- gen.ex(3)[, 1:3]
fit <- nnet(factor(group)~., data=df, size=2, trace=0)
df1 <- make.grid(df)
df1$group <- predict(fit, df1, type="class")
do.graph(df, df1)
A neural network (also called a perceptron) is a general function fitter, and so can be used for regression as well. Here is an example
fit <- nnet(data.matrix(houseprice[, -1]),
houseprice$Price, size=2, linout = 1)
## # weights: 13
## initial value 648818.577146
## final value 37992.517316
## converged
par(mar=c(1, 1, 0, 1))
plotnet(fit)
Let’s study this method using a few artificial examples:
x <- 1:100/100
y <- 10 + 5*x + rnorm(100, 0, 0.5)
df <- data.frame(x=x, y=y)
df$lm <- predict(lm(y~x))
df$nnet1 <- predict(nnet(x, y, size=1, linout = 1, trace = 0))
df$nnet2 <- predict(nnet(x, y, size=2, linout = 1, trace = 0))
df$nnet3 <- predict(nnet(x, y, size=3, linout = 1, trace = 0))
ggplot(data=df, aes(x, y)) +
geom_point() +
geom_line(data=data.frame(x=x, y= df$lm), aes(x, y),
inherit.aes = FALSE) +
geom_line(data=data.frame(x=x, y= df$nnet1), aes(x, y),
inherit.aes = FALSE, color="red") +
geom_line(data=data.frame(x=x, y= df$nnet2), aes(x, y),
inherit.aes = FALSE,color="green") +
geom_line(data=data.frame(x=x, y= df$nnet3), aes(x, y),
inherit.aes = FALSE, color="blue")
y <- x + 3*(x-0.5)^2 + rnorm(100, 0, 0.25)
df <- data.frame(x=x, y=y)
df$lm <- predict(lm(y~x))
df$nnet1 <- predict(nnet(x, y, size=1, linout = 1, trace = 0))
df$nnet2 <- predict(nnet(x, y, size=2, linout = 1, trace = 0))
df$nnet3 <- predict(nnet(x, y, size=3, linout = 1, trace = 0))
ggplot(data=df, aes(x, y)) +
geom_point() +
geom_line(data=data.frame(x=x, y= df$lm), aes(x, y),
inherit.aes = FALSE) +
geom_line(data=data.frame(x=x, y= df$nnet1), aes(x, y),
inherit.aes = FALSE, color="red") +
geom_line(data=data.frame(x=x, y= df$nnet2), aes(x, y),
inherit.aes = FALSE,color="green") +
geom_line(data=data.frame(x=x, y= df$nnet3), aes(x, y),
inherit.aes = FALSE, color="blue")
so higher number of hidden layers leads to a more complicated fit.
It is often tricky to know how many layers to use, and it is easy to overfit. Generally something like cross-validation is needed.
This is another modern method for classification. Its idea is at first very strange. Let’s have another look at example 2:
df <- gen.ex(2, 200)
ggplot(data=df, aes(x, y, color=group)) +
geom_point(size=2) +
theme(legend.position="none")
Let’s say we defined a new variable z by
\[ z=x^2+y^2 \] then
df$z <- df$x^2+df$y^2
ggplot(data=df, aes(x, z, color=group)) +
geom_point() +
geom_hline(yintercept = 1, size=1.2)+
theme(legend.position="none")
and so suddenly we would have a very simple decision rule: declare red is \(z<1\)!
Adding a variable z is like adding an additional dimension. If we could display the data in (x, y, z) space there would a separating hyperplane. It can be shown that by adding sufficient dimensions there will eventually always be a hyperplane that perfectly separates the groups. SVM tries to find such a hyperplane without us having to specify a function, like above. It is implemented in R in the package
library(e1071)
df <- gen.ex(1)[, 1:3]
fit <- svm(factor(group)~., data=df)
df1 <- make.grid(df)
df1$group <- predict(fit, df1)
do.graph(df, df1)
df <- gen.ex(3)[, 1:3]
fit <- svm(factor(group)~., data=df)
df1 <- make.grid(df)
df1$group <- predict(fit, df1)
do.graph(df, df1)
df <- gen.ex(3)[, 1:3]
fit <- svm(factor(group)~., data=df)
df1 <- make.grid(df)
df1$group <- predict(fit, df1)
do.graph(df, df1)