Convolutional Neural Networks (Convnets or CNNs)

A typical CNN sketch:

Source: Mathworks page about CNN

Feature Learning part of the CNN defined in Keras:

Classification part of the CNN defined in Keras:

Convnet layers

  • Dense layers learn global patterns in their input space
  • Convolutional layers learn local patterns
  • The patterns they learn are translation invariant
    • If they learn a pattern in the lower-right of an image, they would also recognise the pattern on the upper-left
    • Dense layers would have to learn the pattern anew in a different position
    • This makes convnets data efficient, needing less data to learn
  • They can learn spatial hierarchies of patterns
    • The first layers learn small patterns such as edges
    • Later layers learn larger patterns based on the small patterns

A CNN model alternate between convolutional and max-pooling layers.

2D convolutional layer

A 2D convolutional layer is defined by layer_conv_2d in Keras. The two main arguments for the layer are filters and kernel_size.

On the GIF below, we see one filter being produced by kernel size (3, 3) from an image with dimension (5,5,1). This particular filter has dimension (3,3).

The following layer will create 32 filters such as the one above by applying a convolution with kernel size (3, 3) into a image with dimension (28, 28, 1). Each of the 32 filters would have dimension (26, 26)

The dimension of the resulting filter is (26, 26) and not (28, 28) due to border effects.

Each filter will go through the following transformation:

filter = convolution(input) 
output = activation(filter + bias)

The total number of parameters defined in layer_conv_2d is given by kernel_height * kernel_width * input_channel * filters (convolution operation) + filters (one bias per filter).

Averaging adjacent pixels blur the image:

Adjacent pixels are very different in the direction perpendicular to the edge:

2D max pooling layer

The max pooling layer is conceptually similar to the convolutional layer, with two main differences.

  • Instead of transforming local patchs via a learned linear transformation, they are transformed via a hard-coded max operation.
  • The window size is usually (2,2) and the stride is equal to 2 (instead of 1 for the convolutional layer), downsampling the filters.

The reasons to downsample are:

  • to reduce the number of coefficients to process
  • to induce spatial-filter hierarchies by making successive convolution layers look at increasingly large windows (in terms of the fraction of the original input they cover).

MNIST dataset

The objective here is to classify the digit contained in a image using a convnet model.

Convnet model

Convolutional Neural Network in Keras:

Adding a classifier on top of the convnet:

Inspect the model:

## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## conv2d_4 (Conv2D)                (None, 26, 26, 32)            320         
## ___________________________________________________________________________
## max_pooling2d_3 (MaxPooling2D)   (None, 13, 13, 32)            0           
## ___________________________________________________________________________
## conv2d_5 (Conv2D)                (None, 11, 11, 64)            18496       
## ___________________________________________________________________________
## max_pooling2d_4 (MaxPooling2D)   (None, 5, 5, 64)              0           
## ___________________________________________________________________________
## conv2d_6 (Conv2D)                (None, 3, 3, 64)              36928       
## ___________________________________________________________________________
## flatten_2 (Flatten)              (None, 576)                   0           
## ___________________________________________________________________________
## dense_3 (Dense)                  (None, 64)                    36928       
## ___________________________________________________________________________
## dense_4 (Dense)                  (None, 10)                    650         
## ===========================================================================
## Total params: 93,322
## Trainable params: 93,322
## Non-trainable params: 0
## ___________________________________________________________________________

Training the convnet on MNIST images:

Evaluate the model on the test data:

## $loss
## [1] 0.03167136
## 
## $acc
## [1] 0.9902

Dealing with JPEG images