A deep-learning model is a directed, acyclic graph of layers.

Layers

Layer is a data-processing component:

Stateless or stateful layers:

Types of layers:

Tensor Type Data Type Data Shape Layer Type Layer Description
2D tensor Vector data (samples, features) layer_dense densely connected layer
3D tensor Sequence data (samples, timesteps, features) layer_lstm recurrent layers
4D tensor Image data (samples, height, width, channels) layer_conv_2d 2D convolution layers

Building deep-learning models in Keras:

Layer compatibility:

Examples

Fully connected model for MNIST

model <- keras_model_sequential() %>%
  layer_dense(units = 512, activation = "relu", input_shape = c(28*28)) %>%
  layer_dense(units = 10, activation = "softmax")

Fully connected model for IMDB

model <- keras_model_sequential() %>%
  layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
  layer_dense(units = 16, activation = "relu") %>%
  layer_dense(units = 1, activation = "sigmoid")

Fully connected model for Boston House dataset

model <- keras_model_sequential() %>%
    layer_dense(units = 64, activation = "relu", input_shape = c(13)) %>%
    layer_dense(units = 64, activation = "relu") %>%
    layer_dense(units = 1)

Explaining the code

The pipe operator

  • %>%
  • The pipe operator comes from the magrittr package.
  • Shorthand for passing the value on its left as the first argument to the function on its right.
model <- keras_model_sequential()
layer_dense(model, units = 512, activation = "relu", input_shape = c(28*28))
layer_dense(model, units = 10, activation = "softmax")
  • Besides compactness, the %>% reminds that Keras models are modified in-place.
    • You don’t operate on model and then return a new model object.
    • Rather, you do something to the model object.

Linear Stack of Layers

  • keras_model_sequential
  • Defines a Keral model composed of a linear stack of layers.
  • Not to be confused with it being a model for sequential data

Dense layer

layer_dense

  • Dense or fully connected layer

Implements the operation: output = activation(dot(input, weight) + bias)

  • input: 2D input tensor. Gets flattened if rank > 2.
  • weight: 2D weight tensor created by the layer.
  • bias: 1D bias tensor created by the layer (use_bias=TRUE).
  • dot(input, weight): Dot product between two tensors.
  • activation(.): Element-wise activation function.

Most important inputs:

  • input_shape: Dimensionality of the input, not including the samples axis.
    • Required only for the first layer in a model.
  • units: Dimensionality of the output space.
  • activation: The name of the activation function. Default to linear.
    • relu(x) = max(x,0) is the most commonly used non-linear activation function.

Some observations

Without them each layer would only be able to learn linear transformations of the input data and a deep stack of linear layers would still implement a linear operation. The activation function relu add non-linearity to the model.

Reference material

This lecture note is based on (Chollet and Allaire 2018).

References

Chollet, F., and J. Allaire. 2018. Deep Learning with R. Manning Publications. https://books.google.no/books?id=xnIRtAEACAAJ.