Check out basic info about loggin in to epic and running jobs (with the queue system slurm
) https://www.hpc.ntnu.no/display/hpc/Getting+Started
Log on with something like (on Mac/Linux) ssh epic.hpc.ntnu.no
.
Load the modules by just typing the following (when you now are at the login node)
keras
Then start R by typing “R” and install keras in R by
install.packages("keras")
and quit R by typing q()
.
NB: do not try to do library(keras); install_keras()
. This is not needed and will probably only produce an error.
Make file called MNISTexSlurm.sh by copying the commands below, or copy my file at epic from /home/mettela/MA8701MNISTexSlurm.sh
(if you are allowed). This file should be excecutable (chmod u+x
)
#!/bin/sh
#SBATCH --partition=WORKQ
#SBATCH --time=00:15:00
#SBATCH --mem=8000
#SBATCH --nodes=1
#SBATCH --ntasks-per-node=1
#SBATCH --job-name="MNISTex"
module purge
module load GCC/7.3.0-2.30
module load OpenMPI/3.1.1
module load R/3.5.1
module load TensorFlow/1.12.0-Python-3.6.6
/usr/bin/time -v Rscript --vanilla MNISTex.R
Here we have set running time of max 15 minutes and asked for 8BG ram (remember to change this if you run on a larger example).
The MNISTex should take around 2.5 minutes and use ca 2 GB ram.
Observe that in the slum script we only ask for 1 CPU (ntasks-per-node=1
). However, Keras is parallellised and can run on many CPUs, but for this small example running with 20 CPUs (ntasks-per-node=20
) will take longer time than running on one CPU because the example does not really require any computing power and set-up-time is high for this example. For your larger jobs running with more CPUs is smart.
Make file called MNISTex.R by copying the commands below, or copy my file at epic from /home/mettela/MA8701MNISTex.R
(if you are allowed). As you see this is a simple example with a network with one dense hidden layer, and a 10 class problem. This is partially taken from the R keras
page at RStudio
https://keras.rstudio.com/
library(keras)
mnist <- dataset_mnist()
#Train
train_images <- mnist$train$x
train_labels <- mnist$train$y
# Test data
test_images <- mnist$test$x
test_labels <- mnist$test$y
network <- keras_model_sequential() %>%
layer_dense(units = 512, activation = "relu", input_shape = c(28*28)) %>%
layer_dense(units = 10, activation = "softmax")
network %>% compile(
optimizer = "rmsprop",
loss = "categorical_crossentropy",
metrics = c("accuracy")
)
train_images <- array_reshape(train_images, c(60000, 28 * 28))
train_images <- train_images / 255
train_labels <- to_categorical(train_labels)
test_images <- array_reshape(test_images, c(10000, 28 * 28))
test_images <- test_images / 255
test_labels <- to_categorical(test_labels)
fitted<- network %>% fit(train_images, train_labels,
epochs = 30, batch_size = 128,
validation_split = 0.2
)
network %>% evaluate(test_images,test_labels)
network %>% predict_classes(test_images)
Now run the script with slurm, and look at the output.
sbatch MNISTexSlurm.sh
you are given a job id, I got Submitted batch job 248476
and by typing more slurm-248476.out
I saw that I got loss and accuracy
$loss
[1] 0.1165949
$acc
[1] 0.982