Recurrent Neural Networks (RNNs)
A RNN process sequences by iterating through the sequence elements and maintaining a state containing information relative to what it has seen so far.
- The state of the RNN is reset between processing two different, independent sequences (such as two different IMDB reviews).
Transition equation for a simple RNN:
output_t <- tanh(as.numeric((W %*% input_t) + (U %*% state_t) + b))
Following is R pseudo-code for a simple RNN layer:
Embedding layer
Previously, we have encoded text data using integers.
- The order of the words were not preserved
- The representation did not captured any text semantics
Another approach to feed text to our models is to use an embedding layer:
- low-dimensional floating-point vectors learned from data.
- geometric relationships between word vectors should reflect the semantic relationships between these words.
- Take a 2D input tensor of integers of shape
(samples, sequence_length)
- Return a 3D floating-point tensor of shape
(samples, sequence_length, embedding_dimensionality)
- Such a 3D tensor can be processed by an RNN layer.
- The vector are randomly initialized prior to training.
The IMDB dataset
The objective here is to classify a movie review as either positive or negative.
Recurring neural networks with an embedding layer
- Data preparation parameters
max_features <- 10000 # Number of most frequent words
maxlen <- 500 # Padding the sequence of words to be of equal length
batch_size <- 32 # Batch size used for training
- Downloading the data
imdb <- dataset_imdb(num_words = max_features)
c(c(input_train, y_train), c(input_test, y_test)) %<-% imdb
cat(length(input_train), "train sequences\n")
## 25000 train sequences
## 25000 test sequences
- Padding the sequences
input_train <- pad_sequences(input_train, maxlen = maxlen)
input_test <- pad_sequences(input_test, maxlen = maxlen)
cat("input_train shape:", dim(input_train), "\n")
## input_train shape: 25000 500
## input_test shape: 25000 500
- Defining the model with embedding and simple RNN layers:
model <- keras_model_sequential() %>%
layer_embedding(input_dim = max_features, output_dim = 32) %>%
layer_simple_rnn(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")
## Model
## ___________________________________________________________________________
## Layer (type) Output Shape Param #
## ===========================================================================
## embedding_1 (Embedding) (None, None, 32) 320000
## ___________________________________________________________________________
## simple_rnn_1 (SimpleRNN) (None, 32) 2080
## ___________________________________________________________________________
## dense_1 (Dense) (None, 1) 33
## ===========================================================================
## Total params: 322,113
## Trainable params: 322,113
## Non-trainable params: 0
## ___________________________________________________________________________
- Compiling the model
- Training and validation
history <- model %>% fit(
input_train, y_train,
epochs = 10,
batch_size = 128,
validation_split = 0.2
)
LSTM layer
The simple RNN layer should theoretically be able to retain at time t
information about inputs seen many timesteps before.
- But in practice, such long-term dependencies are very hard to learn.
The LSTM (long-short term memory) layer was designed to address this issue.
- It allow past information to be reinjected at a later time.
Following is R pseudo-code for a LSTM layer:
i_t = activation(dot(state_t, Ui) + dot(input_t, Wi) + bi)
f_t = activation(dot(state_t, Uf) + dot(input_t, Wf) + bf)
k_t = activation(dot(state_t, Uk) + dot(input_t, Wk) + bk)
c_t+1 = i_t * k_t + c_t * f_t
output_t = activation(dot(state_t, Uo) + dot(input_t, Wo) + dot(C_t, Vo) + bo)
Using the LSTM layer in Keras:
model <- keras_model_sequential() %>%
layer_embedding(input_dim = max_features, output_dim = 32) %>%
layer_lstm(units = 32) %>%
layer_dense(units = 1, activation = "sigmoid")
model %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("acc")
)
history <- model %>% fit(
input_train, y_train,
epochs = 10,
batch_size = 128,
validation_split = 0.2
)
Stacking recurrent layers
- We need to get all of the intermediate layers to return full sequences