Recurrent Neural Networks (RNNs)

A RNN process sequences by iterating through the sequence elements and maintaining a state containing information relative to what it has seen so far.

Transition equation for a simple RNN:

output_t <- tanh(as.numeric((W %*% input_t) + (U %*% state_t) + b))

Following is R pseudo-code for a simple RNN layer:

state_t <- 0
for (input_t in input_sequence) {
  output_t <- activation(dot(W, input_t) + dot(U, state_t) + b)
  state_t <- output_t
}

Embedding layer

Previously, we have encoded text data using integers.

Another approach to feed text to our models is to use an embedding layer:

The IMDB dataset

The objective here is to classify a movie review as either positive or negative.

Recurring neural networks with an embedding layer

  • Data preparation parameters
max_features <- 10000 # Number of most frequent words
maxlen <- 500         # Padding the sequence of words to be of equal length
batch_size <- 32      # Batch size used for training
  • Downloading the data
imdb <- dataset_imdb(num_words = max_features)
c(c(input_train, y_train), c(input_test, y_test)) %<-% imdb
cat(length(input_train), "train sequences\n")
## 25000 train sequences
cat(length(input_test), "test sequences")
## 25000 test sequences
  • Padding the sequences
input_train <- pad_sequences(input_train, maxlen = maxlen)
input_test <- pad_sequences(input_test, maxlen = maxlen)
cat("input_train shape:", dim(input_train), "\n")
## input_train shape: 25000 500
cat("input_test shape:", dim(input_test), "\n")
## input_test shape: 25000 500
  • Defining the model with embedding and simple RNN layers:
model <- keras_model_sequential() %>%
  layer_embedding(input_dim = max_features, output_dim = 32) %>%
  layer_simple_rnn(units = 32) %>%
  layer_dense(units = 1, activation = "sigmoid")
model
## Model
## ___________________________________________________________________________
## Layer (type)                     Output Shape                  Param #     
## ===========================================================================
## embedding_1 (Embedding)          (None, None, 32)              320000      
## ___________________________________________________________________________
## simple_rnn_1 (SimpleRNN)         (None, 32)                    2080        
## ___________________________________________________________________________
## dense_1 (Dense)                  (None, 1)                     33          
## ===========================================================================
## Total params: 322,113
## Trainable params: 322,113
## Non-trainable params: 0
## ___________________________________________________________________________
  • Compiling the model
model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)
  • Training and validation
history <- model %>% fit(
  input_train, y_train,
  epochs = 10,
  batch_size = 128,
  validation_split = 0.2
)
plot(history)

LSTM layer

The simple RNN layer should theoretically be able to retain at time t information about inputs seen many timesteps before.

  • But in practice, such long-term dependencies are very hard to learn.

The LSTM (long-short term memory) layer was designed to address this issue.

  • It allow past information to be reinjected at a later time.

Using the LSTM layer in Keras:

model <- keras_model_sequential() %>%
  layer_embedding(input_dim = max_features, output_dim = 32) %>%
  layer_lstm(units = 32) %>%
  layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
  optimizer = "rmsprop",
  loss = "binary_crossentropy",
  metrics = c("acc")
)
history <- model %>% fit(
  input_train, y_train,
  epochs = 10,
  batch_size = 128,
  validation_split = 0.2
)
plot(history)