Deep Learning

Undoubtably the hottest ticket in machine learning at the moment is deep learning. In some sense this is strange because deep learning is really just a neural network with multiple hidden layers. However, it turns out that the use of multiple hidden layers can improve the performance of neural networks dramatically.

There are a nmber of packages in R that implement deep learning such as neuralnet, h20, RSNNS, tensorflow, deepnet, darch, rnn, FCNN4R, rcppDL, deepr, …. We will here discuss keras and tensorflow, which provide APIs to the python implementations of various routines. T use them you need to install a number of programs:

  • install the python from https://www.anaconda.com/. NOte that this is almost 500 MB.

  • set up anaconda on Windows. Unlike most programs anaconda needs to be set up properly on your computer. First you need to set an environment variable. To do so first use explorer to find the folder where the program python.exe is located. It is likely something like C:\Users\username\Anaconda3. Next type the word path in the windows search box and click on Edit system environment variables. In the box that opens click on Environment Variables. Select Path and click on Edit. Next click an New and paste the folder name in. Finally use the buttons on the right to move this entry to the top of the list.

  • type cmd in the windows search box, and click on Run as administrator. In the window that opens run the command \Scripts\activate base.

For more detailed instructions check out https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html#activating-an-environment

Next I suggest you update your R version (if necessary) and all the packages. When done open a R gui, (not RStudio) and run

devtools::install_github("rstudio/keras")
library(keras)
install_keras()

To check that everything is set up properly run

mnist <- dataset_mnist()

which should download a large data set. If you do this in RMarkdown I suggest to add the chunck cache=TRUE so you need to do this only once.

Example OCR

This example is taken from https://tensorflow.rstudio.com/keras/

As an example we will use the mnist data set. This is a set of pictures of hand written numbers, and we want to teach the computer to recognize them. An example is shown here:

The data is already split into training and testing parts:

x_train <- mnist$train$x
y_train <- mnist$train$y
x_test <- mnist$test$x
y_test <- mnist$test$y

The x data is a 3-d array (images,width,height) of grayscale values . To prepare the data for training we convert the 3-d arrays into matrices by reshaping width and height into a single dimension (28x28 images are flattened into length 784 vectors). Then, we convert the grayscale values from integers ranging between 0 to 255 into floating point values ranging between 0 and 1:

x_train <- array_reshape(x_train, c(nrow(x_train), 784))
x_test <- array_reshape(x_test, c(nrow(x_test), 784))
x_train <- x_train / 255
x_test <- x_test / 255

The y data is an integer vector with values ranging from 0 to 9. To prepare this data for training we one-hot encode the vectors into binary class matrices using the Keras to_categorical() function:

y_train <- to_categorical(y_train, 10)
y_test <- to_categorical(y_test, 10)

The core data structure of Keras is a model, a way to organize layers. The simplest type of model is the Sequential model, a linear stack of layers.

We begin by creating a sequential model and then adding layers using the pipe (%>%) operator:

library(magrittr)
model <- keras_model_sequential() 
model %>% 
  layer_dense(units = 256, activation = 'relu', input_shape = c(784)) %>% 
  layer_dropout(rate = 0.4) %>% 
  layer_dense(units = 128, activation = 'relu') %>%
  layer_dropout(rate = 0.3) %>%
  layer_dense(units = 10, activation = 'softmax') ->
model  

Let’s see what we got:

summary(model)
## Model: "sequential"
## ________________________________________________________________________________
## Layer (type)                        Output Shape                    Param #     
## ================================================================================
## dense (Dense)                       (None, 256)                     200960      
## ________________________________________________________________________________
## dropout (Dropout)                   (None, 256)                     0           
## ________________________________________________________________________________
## dense_1 (Dense)                     (None, 128)                     32896       
## ________________________________________________________________________________
## dropout_1 (Dropout)                 (None, 128)                     0           
## ________________________________________________________________________________
## dense_2 (Dense)                     (None, 10)                      1290        
## ================================================================================
## Total params: 235,146
## Trainable params: 235,146
## Non-trainable params: 0
## ________________________________________________________________________________

Next we compile the model:

model %>% compile(
  loss = 'categorical_crossentropy',
  optimizer = optimizer_rmsprop(),
  metrics = c('accuracy')
  ) ->
model  

Now we can train it. This takes a while, so add cache=TRUE to your r chunk

model %>% fit(
  x_train, y_train, 
  epochs = 30, batch_size = 128, 
  validation_split = 0.2
  ) ->
history  

Here is what happenend:

plot(history)

and we see we have about a 98.2% accuracy on the training set!

For evaluation and prediction we have

model %>% 
  evaluate(x_test, y_test)
## $loss
## [1] 2.350988
## 
## $accuracy
## [1] 0.1153
model %>% 
  predict_classes(x_test) ->
predicted  
predicted <- y_train <- to_categorical(predicted, 10)
prop.table(table(predicted, y_test))
##          y_test
## predicted       0       1
##         0 0.81153 0.08847
##         1 0.08847 0.01153