# Lab 01 A simple MNIST model

GPU Computing Resources and Community at the University of Sheffield

# Lab 01: A simple MNIST model

Remember to be working from the root directory of DLTraining code sample throughout all practicals.

In this lab, we will put together a basic 3-layer model that can identify handwritten digits by learning from the MNIST database.

## Job scripts

A reminder to add the following lines to all the job script that you submit with qsub:

#!/bin/bash
#$-l gpu=1 -P rse-training -q rse-training.q -l rmem=10G -j y module load apps/caffe/rc5/gcc-4.9.4-cuda-8.0-cudnn-5.1 #Your code below....  The -j y option is included so that the job output prints everything to one output file e.g. your_scriptname.sh.o<jobid>. ## Running the pre-made model Submit the job file code/lab01/mnist_simple_train.sh using qsub: qsub code/lab01/mnist_simple_train.sh  Once the job has finished, check the output for more information in the file mnist_simple_train.sh.e<jobid> at the end of the file you should get something like below: I0127 16:04:25.357823 9366 solver.cpp:317] Iteration 10000, loss = 0.207118 I0127 16:04:25.357884 9366 solver.cpp:337] Iteration 10000, Testing net (#0) I0127 16:04:25.408476 9366 solver.cpp:404] Test net output #0: accuracy = 0.9226 I0127 16:04:25.408501 9366 solver.cpp:404] Test net output #1: loss = 0.279165 (* 1 = 0.279165 loss) I0127 16:04:25.408510 9366 solver.cpp:322] Optimization Done. I0127 16:04:25.408516 9366 caffe.cpp:254] Optimization Done.  You’ve just trained a basic neural network model on Caffe! The accuracy should be around 92%. Let’s have a look at how caffe is used from the command line, type in: cat code/lab01/mnist_simple_train.sh  In the script you will see the line with caffe train: caffe train -solver=code/lab01/mnist_simple_solver.prototxt  Caffe offers a command line interface for training your models. The above command indicates that you will be using caffe to train the model with solver file located at code/lab01/mnist_simple_solver.prototxt. Two text files are needed to get a model running in Caffe. A model file which defines the architecture of the network and a solver file that lets you choose the approach to training and optimisation. ## Implementing the model A model file consists of sequences of layers each with specific functionality such as Data layer that allows import of external data from raw images or databases such as LMDB or the Loss layer that calculates the error/loss function. Blobs are matrices for storing data and are used as input (bottom) and generated as outputs (top) of layers. Layers can have multiple blobs as inputs or outputs depending on the type. The model file is written in the format of Google’s protocol buffer (protobuf). To get a feel for the model file, we will implement a very simple model that trains on the mnist data with only one dense hidden layer. We’ll use the Netscope to visualise our network as we make it. Open the following link in a new tab/window to get started: http://ethereon.github.io/netscope/#/editor. First name the model: name: "MNIST Simple"  Then add a data layer: layer { name: "mnist" type: "Data" top: "data" top: "label" transform_param { scale: 0.00390625 } data_param { source: "data/mnist_train_lmdb" batch_size: 64 backend: LMDB } }  This creates a Data layer named “mnist” that reads from an LMDB database source: "data/mnist_train_lmdb" note that the location of files referenced in a script is relative to where you call Caffe and not the location of the script itself. The layer has two outputs, top: "data" blob is the image data and top: "label" is the correctly categorised label in a one-hot format (an array with a 1 value for the correct category and 0 for the others). For training we’ll use a batch size of 64 batch_size: 64. Each input pixel has range of 0-255 so we need to scale it to the range 0-1 by multiplying with scale: 0.00390625 (1/256) in the transform_param. The input data is arranged as a 2D grid which we’ll use for our next lab. For now we’ll flatten it to a 1-dimensional array using a Flatten layer with data blob as input and flatdata blob as output: layer { name: "flatdata" type: "Flatten" bottom: "data" top: "flatdata" }  Now add an InnerProduct layer, this is essentially a ‘dense’ layer where all nodes are connected to every other node in the layer below: layer { name: "ip" type: "InnerProduct" bottom: "flatdata" top: "ip" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } }  The InnerProduct layer ip takes the flatdata blob as input and generates ip blob as output. The num_ouput is 10 in this case as we have 10 classifiers (digits 0-9). The fillers (weight_filler, bias_filler) allow us to randomly initialize the value of the weights and bias. For the weight_filler, we will use the xavier algorithm that automatically determines the scale of initialisation based on the number of input and output neurons. For the bias_filler, we will simply initialise it as constant, with the default filling value 0. lr_mults are the learning rate adjustments for the layer’s learnable parameters. In this case, we will set the weight learning rate to be the same as the learning rate given by the solver during runtime, and the bias learning rate to be twice as large as that - this usually leads to better convergence rates. Now we just need to add a suitable activation function for classification (Softmax) and and loss calculation to our model. Caffe provides a layer that does both for us with SoftmaxWithLoss layer. layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip" bottom: "label" top: "loss" }  The loss layer takes the 2 input blobs ip and label and generates a loss blob. That’s all we need to create a model for training. Press Shift+Enter to view the network. You should get something like this: Rules can be added to layers to specify when they’re included in to a network. The phase rule indicates whether the layer will be included in either the Training or the Testing phase  layer { // ...layer definition... include: { phase: TRAIN } }  After a certain number of training epochs we will have the model use the test data to verify the model’s accuracy. Replace the existing data layer with: layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 } data_param { source: "data/mnist_train_lmdb" batch_size: 64 backend: LMDB } } layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "data/mnist_test_lmdb" batch_size: 100 backend: LMDB } }  We now have 2 data layers that reads from the mnist_train_lmdb in the training phase and from minist_test_lmdb in the testing phase. Adding an Accuracy utility layer in the testing phase makes Caffe calculate and output accuracy values. layer { name: "accuracy" type: "Accuracy" bottom: "ip" bottom: "label" top: "accuracy" include { phase: TEST } }  Your final model file should look something like this: name: "MNIST Simple" layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TRAIN } transform_param { scale: 0.00390625 } data_param { source: "data/mnist_train_lmdb" batch_size: 64 backend: LMDB } } layer { name: "mnist" type: "Data" top: "data" top: "label" include { phase: TEST } transform_param { scale: 0.00390625 } data_param { source: "data/mnist_test_lmdb" batch_size: 100 backend: LMDB } } layer { name: "flatdata" type: "Flatten" bottom: "data" top: "flatdata" } layer { name: "ip" type: "InnerProduct" bottom: "flatdata" top: "ip" param { lr_mult: 1 } param { lr_mult: 2 } inner_product_param { num_output: 10 weight_filler { type: "xavier" } bias_filler { type: "constant" } } } layer { name: "accuracy" type: "Accuracy" bottom: "ip" bottom: "label" top: "accuracy" include { phase: TEST } } layer { name: "loss" type: "SoftmaxWithLoss" bottom: "ip" bottom: "label" top: "loss" }  Create a file mnist_simple.prototxt and paste the content of the model in to it. We can now start to implement the solver. To see what other layers Caffe supports, see the layer catalogue. ## Implementing the solver The solver file lets us define how the training and testing is performed. Create a mnist_simple_solver.prototxt file and copy in the code below:  # The train/test net protocol buffer definition net: "mnist_simple.prototxt" # Declare solver type, SGD is Stochastic Gradient Descent type: "SGD" # test_iter specifies how many forward passes the test should carry out. # In the case of MNIST, we have test batch size 100 and 100 test iterations, # covering the full 10,000 testing images. test_iter: 100 # Carry out testing every 500 training iterations. test_interval: 500 # The base learning rate, momentum and the weight decay of the network. base_lr: 0.01 momentum: 0.9 weight_decay: 0.0005 # The learning rate policy lr_policy: "inv" gamma: 0.0001 power: 0.75 # Display every 100 iterations display: 100 # The maximum number of iterations max_iter: 10000 # snapshot intermediate results snapshot: 5000 snapshot_prefix: "intro_dl_snapshot_" # solver mode: CPU or GPU solver_mode: GPU  See the comments in the code for more details. For more information on Caffe’s available solvers, see http://caffe.berkeleyvision.org/tutorial/solver.html. ## Training To train the model create a new script mnist_simple_train.sh with the contents #!/bin/bash #$ -l gpu=1 -P rse-training -q rse-training.q -l rmem=10G

caffe train -solver=mnist_simple_solver.prototxt


Submit the job using qsub. In the output file you should get similar results to the pre-built model with final accuracy of around 92%.

## Activation Layers

Our current model has Softmax (Sigmoid) rolled in to the loss layer to introduce non-linearity but as you start to add additional layers, activation functions has to be added manually.

The ReLU (rectified linear unit) is a popular activation function filters out values below 0. It reduces the chance of vanishing gradients and has been found to converge faster than sigmoid type functions.

The ReLU layer can be added to the model with

layer {
name: "relu"
type: "ReLU"
bottom: "a_blob"
top: "a_blob"
}


Note that you can use the same blob name for both top and bottom to do the computation in-place and save memory by not creating a new blob.

See the layer catalogue for more activation layers.

Try adding additional InnerProduct layer(s) to the current network, does the accuracy improve? (Don’t forget to add activation functions.)
Check your model at code/lab01/mnist_simple_extra_layer.prototxt.