The IMDB dataset
The objective here is to classify a movie review as either positive or negative.
Preparing the data
The data has already been preprocessed: the reviews (sequences of words) have been turned into sequences of integers, where each integer stands for a specific word in a dictionary.
- The argument
num_words = 10000
keep only the top 10,000 most frequently occurring words in the training data.
imdb <- dataset_imdb(num_words = 10000)
train_data <- imdb$train$x
train_labels <- imdb$train$y
test_data <- imdb$test$x
test_labels <- imdb$test$y
- Each review is a list of word indices.
- The labels are lists of 0s and 1s, where 0 stands for negative and 1 stands for positive.
- The first review in the list:
str(train_data[[1]])
## int [1:218] 1 14 22 16 43 530 973 1622 1385 65 ...
train_labels[[1]]
## [1] 1
Turning sequence of integers back to english
Below is the code to turn the reviews from sequence of integers back to english.
word_index <- dataset_imdb_word_index()
reverse_word_index <- names(word_index)
names(reverse_word_index) <- word_index
decoded_review <- sapply(train_data[[1]], function(index) {
word <- if (index >= 3) reverse_word_index[[as.character(index - 3)]]
if (!is.null(word)) word else "?"
})
Turning sequence of integers to tensor format
- The
vectorize_sequences
below will produce a tensor of rank 2 of the form(samples, features)
- Each sample is represented by a feature vector of the size of the dictionary being used with values equal to 1 if a particular word is present and 0 if the particular word is absent.
vectorize_sequences <- function(sequences, dimension = 10000) {
results <- matrix(0, nrow = length(sequences), ncol = dimension)
for (i in 1:length(sequences))
results[i, sequences[[i]]] <- 1
results
}
x_train <- vectorize_sequences(train_data)
x_test <- vectorize_sequences(test_data)
y_train <- as.numeric(train_labels)
y_test <- as.numeric(test_labels)
Model definition
model <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid")
Model compilation
model %>% compile(
optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")
)
Validating your approach
Create a validation set by setting apart 10,000 samples from the original training data.
val_indices <- 1:10000
x_val <- x_train[val_indices,]
partial_x_train <- x_train[-val_indices,]
y_val <- y_train[val_indices]
partial_y_train <- y_train[-val_indices]
history <- model %>% fit(
partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = list(x_val, y_val)
)
Note that the call to fit()
returns a history object. Let’s take a look at it:
str(history)
## List of 2
## $ params :List of 8
## ..$ metrics : chr [1:4] "loss" "acc" "val_loss" "val_acc"
## ..$ epochs : int 20
## ..$ steps : NULL
## ..$ do_validation : logi TRUE
## ..$ samples : int 15000
## ..$ batch_size : int 512
## ..$ verbose : int 1
## ..$ validation_samples: int 10000
## $ metrics:List of 4
## ..$ acc : num [1:20] 0.793 0.903 0.927 0.944 0.953 ...
## ..$ loss : num [1:20] 0.51 0.305 0.227 0.178 0.146 ...
## ..$ val_acc : num [1:20] 0.875 0.855 0.885 0.885 0.889 ...
## ..$ val_loss: num [1:20] 0.381 0.357 0.296 0.291 0.283 ...
## - attr(*, "class")= chr "keras_training_history"
The history object includes parameters used to fit the model (history$params
) as well as data for each of the metrics being monitored (history$metrics
).
plot(history)
- You can customize all of this behavior via various arguments to the
plot()
method. - We can create custom visualization by using
as.data.frame()
method on the history to obtain a data frame with factors for each metric as well as training versus validation:
history_df <- as.data.frame(history)
head(history_df)
## epoch value metric data
## 1 1 0.7932000 acc training
## 2 2 0.9034667 acc training
## 3 3 0.9267333 acc training
## 4 4 0.9436667 acc training
## 5 5 0.9529333 acc training
## 6 6 0.9670000 acc training
This fairly naive approach achieves an accuracy of 88%. With state-of-the-art approaches, you should be able to get close to 95%.
Predicting on new data
model %>% predict(x_test[1:10,])
## [,1]
## [1,] 5.316609e-03
## [2,] 9.999976e-01
## [3,] 8.015566e-01
## [4,] 9.935709e-01
## [5,] 9.974100e-01
## [6,] 9.984455e-01
## [7,] 6.220128e-01
## [8,] 1.460681e-06
## [9,] 9.731751e-01
## [10,] 9.999981e-01
Fighting overfitting
Reducing the network’s size
Let’s try a smaller network:
history <- keras_model_sequential() %>%
layer_dense(units = 4, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 4, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")) %>%
fit(
partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = list(x_val, y_val))
plot(history)
And a bigger network:
history <- keras_model_sequential() %>%
layer_dense(units = 512, activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 512, activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")) %>%
fit(
partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = list(x_val, y_val))
plot(history)
Adding weight regularization
Adding L2 weight regularization to the model:
history <- keras_model_sequential() %>%
layer_dense(units = 16, kernel_regularizer = regularizer_l2(0.001),
activation = "relu", input_shape = c(10000)) %>%
layer_dense(units = 16, kernel_regularizer = regularizer_l2(0.001),
activation = "relu") %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")) %>%
fit(
partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = list(x_val, y_val))
plot(history)
Adding dropout
Let’s add two dropout layers in the IMDB network to see how well they do at reducing overfitting.
history <- keras_model_sequential() %>%
layer_dense(units = 16, activation = "relu", input_shape = c(10000)) %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 16, activation = "relu") %>%
layer_dropout(rate = 0.5) %>%
layer_dense(units = 1, activation = "sigmoid") %>%
compile(optimizer = "rmsprop",
loss = "binary_crossentropy",
metrics = c("accuracy")) %>%
fit(
partial_x_train,
partial_y_train,
epochs = 20,
batch_size = 512,
validation_data = list(x_val, y_val))
plot(history)