Wiki

Clone wiki

DeepDriving / DeepDrivingTrain

Deep-Driving

Train the Network

Start the Training

To start the training, simply run the script train.py in the pyhton/scripts directory of the repository.

cd <repository-path>/python/scripts

python train.py

Every 10 epochs, a model checkpoint is stored. If you want to retrain a model from scratch, you can either delete the checkpoint directory or set the variable IsRetrain to True inside the train.py script.

Information about Training

  • The training data-set consists of around 500.000 labeled images with a size of 280x210 pixels.

  • For training the input images are shuffled.

  • One epoch consists of 10.000 images and takes around 25 seconds per epoch on a GTX 1080 Ti graphic card.

  • This means, it takes around 50 epochs to consider (almost) every image on the training data-set.

  • A full training run needs 2.000 epochs.

  • As optimizer a Nesterov Momentum optimizer is used with a momentum value of 0.9 and an initial learning rate of 0.01.

  • Every 300 epochs, the learning rate is reduced by 50%.

  • The weight decay strength is 0.005.

Model Description

The neural network, used by the deep-driving project, is very similar to the original AlexNet implementation. It consist of 5 convolutional layers, where some of them use max-pooling. In addition 3 fully connected layers and one output layer with sigmoid activation functions are present.

The following listing shows the network implementation:

Seq = dl.layer.Sequence("Network")

# Setup standard initializer
Conv2D_BN_ReLU = dl.layer.Conv2D_BN_ReLU.setKernelInit(init.NormalInitializer(stddev=0.01))
Dense_BN_ReLU  = dl.layer.Dense_BN_ReLU.setWeightInit(init.NormalInitializer(stddev=0.005))


with Seq.addLayerName("Conv"):
  Seq.add(Conv2D_BN_ReLU(Kernel=11, Filters=96, Stride=4, Padding="VALID"))
  Seq.add(dl.layer.MaxPooling(Window=3, Stride=2))

with Seq.addLayerName("Conv"):
  Seq.add(Conv2D_BN_ReLU(Kernel=5, Filters=256, Groups=2))
  Seq.add(dl.layer.MaxPooling(Window=3, Stride=2))

with Seq.addLayerName("Conv"):
  Seq.add(Conv2D_BN_ReLU(Kernel=3, Filters=384, Groups=1))

with Seq.addLayerName("Conv"):
  Seq.add(Conv2D_BN_ReLU(Kernel=3, Filters=384, Groups=2))

with Seq.addLayerName("Conv"):
  Seq.add(Conv2D_BN_ReLU(Kernel=3, Filters=256, Groups=2))
  Seq.add(dl.layer.MaxPooling(Window=3, Stride=2, Padding="VALID"))

with Seq.addLayerName("Dense"):
  Seq.add(Dense_BN_ReLU(4096))
  Seq.add(dl.layer.Dropout(0.5))

with Seq.addLayerName("Dense"):
  Seq.add(Dense_BN_ReLU(4096))
  Seq.add(dl.layer.Dropout(0.5))

with Seq.addLayerName("Dense"):
  Seq.add(Dense_BN_ReLU(256)
     .setWeightInit(init.NormalInitializer(stddev=0.01)))
  Seq.add(dl.layer.Dropout(0.5))

with Seq.addLayerName("Output"):
  Seq.add(dl.layer.Dense(OutputNodes)
     .setWeightDecay(0.0)
     .setWeightInit(init.NormalInitializer(stddev=0.01)))
  Seq.add(dl.layer.activation.Sigmoid())

Output = Seq.apply(Input)

The output of some convolutional layers are grouped into two groups. This grouping of feature-maps is typical for the AlexNet and the reason for this is historical: When the AlexNet was invented, there were no 4 GB graphic card for training available. Thus some layers where split into two different groups to be able to calculate them in parallel on two graphic cards with 3 GB memory. Today this is not necessary anymore, but for compatibility reasons this grouping is still used for AlexNet implementation.

In contrast to the original AlexNet implementation, every layer except of the output layer uses a batch-normalization. Furthermore no local-response normalization layers are used anymore. This leads to a better training compared to the original implementation.

Outputs of the Model

The network has 14 output signals in the range of 0 to 1. Due to the sigmoid activation function, values near to 0 or near to 1 are very rare. To receive the value ranges from the labels, the output values need to be denormalized. For this each value is shifted and scaled by a constant factor. The following table describes the meaning of every output value and the denormalization:

Index Meaning Shift Scale
0 Angle -0.5 *1.1
1 L -1.34445 /0.17778
2 M -0.6714 /0.1149
3 R +0.34445 /0.17778
4 DistL -0.12 *95.0
5 DistR -0.12 *95.0
6 LL -1.40909 /0.14545
7 ML -0.9 /0.16
8 MR -0.1 /0.16
9 RR +0.40909 /0.14545
10 DistLL -0.12 *95.0
11 DistMM -0.12 *95.0
12 DistRR -0.12 *95.0
13 Fast +0 *1

Loss-Function

The loss function is calculated by the sum of the squared differences between the output values and the normalized label values.

Error-Function

The error function is calculated by the absolute difference of the label values compared to the denormalized output values. This is calculated for every sample in the batch, which leads to the mean absolute error (MAE) and the standard deviation (SD) of the error for the whole batch.

Monitor the Training Progress with Tensorboard

The deep-driving implementation in this repository uses tensorboard summaries for logging. Thus it is easy to monitor the training progress with tensorbaord. To do so, you need to start tensorboard with the summary directory as log-dir:

cd <repository-path>/python/scripts

tensorboard --logdir=Summary

Afterwards you can connect to tensorboard by using a browser (chrome seems to work very well with tensorboard) with the URL 127.0.0.1:6006.

There are three important scalar values for monitoring: The mean absolute error, the standard deviation of the error and the label-loss.

The following image shows the mean absolute error vs. the number of epochs:

Error Diagram

The blue line is the error of the training data and the pink line the error of the validation data. In the beginning of training the error of the validation data is higher than the error of the training data. After around 300 iteration this is changing. In the end the validation error should arrive at around 16 and the training error at around 20. Normally it is unusual to have a validation error lower than the training error. However keep in mind, that this error value is the sum of the error values for every output. Due to the different scaling, a small error of one output can lead to a big value in the sum, while a big error of another output leads to just a small value in the sum. It seems like the validation data has more errors on the small scaled outputs than on the big scaled ones. So even if the normalized validation error is higher than the training error (see the loss-diagram), the sum of the mean absolute error is smaller. In the category "DetailError" you can see a diagram like this for every single output value.

The following image shows the standard deviation of the error vs. the number of epochs:

Standard Deviation Diagram

The following image shows the loss of the labels vs. the number of epochs:

Loss Diagram

In this loss value, the weight decay term is not included. Thus you can compare the values with different weight decay settings. If you want to monitor the full loss-function, you need to look at the "Loss/Loss" Diagram, which includes the weight decay term. Here the loss of the validation data is always higher than the loss of the training data. Furthermore the loss of the validation data seems to stagnate at around 0.325, while the loss of the training data is still decreasing. This indicates, that further training will not lead to a better quality of the model. Thus training is stopped at 2000 epochs.

In the "Image" section of tensorboard, you can look at the input images for the network. Those images are pre-processed, which means that the mean-image is already subtracted. That's why the colors are a little bit different to the original images:

Images

Furthermore you can look at the feature maps of every convolutional layer.

First Convolutional Layer:

Conv1

Second Convolutional Layer:

Conv2

Third Convolutional Layer:

Conv3

Fourth Convolutional Layer:

Conv4

Fifth Convolutional Layer:

Conv5

Another interesting section of tensorboard is the "Graph" section, which shows a nice graph of the whole model, including the data-readers:

Graph

The "Text" section of tensorboard provides the user with example outputs and the corresponding labels. The outputs and labels belong to the first image in the "Image" section of the current epoch.

Output Table

Next Step

Updated