Compiling Keras models

Now that we have a Keras model defined. We need to configure how this model will be trained.

Fully connected model for Boston House dataset

Gradient-based optimization

Gradient descent

Mini-batch stochastic gradient descent (SGD) applied to a Deep Learning model:

  1. Draw a batch of training samples x and corresponding targets y.
  2. Run the model on x to obtain predictions y' (forward pass).
  3. Compute the loss of the model on the batch, a measure of the mismatch between y' and y.
  4. Compute the gradient of the loss with regard to the model’s parameters (backward pass).
  5. W = W - (step * gradient)
  6. Repeat 1-5 until convergence.

Backpropagation algorithm:

Variations of SGD

RMSprop

Further reading:

Loss functions

Loss function or objective function:

Common problem types and loss functions:

Problem Type Last-layer activation Loss function
Binary classification sigmoid binary_crossentropy
Multiclass classification softmax categorical_crossentropy
Regression None mse

Binary cross-entropy

\[- y_i\log(p_{i1}) - (1-y_i) \log(1 - p_{i1})\]

Categorical cross-entropy

\[- \sum_{j=1}^C y_i \log(p_{ij})\]

Some observations:

Metrics

Provides different forms to measure how well the predictions are compared with the true values.

Reference material

This lecture note is based on (Chollet and Allaire 2018) and the following material:

References

Chollet, F., and J. Allaire. 2018. Deep Learning with R. Manning Publications. https://books.google.no/books?id=xnIRtAEACAAJ.