The Deep-Learning Framework

Introduction

The module deep_learning in the repositories module directory contains several classes and functions to make the development with tensorflow more easy. The following guide will shortly explain the concept of those classes and how to use them.

There are three different types of classes: Runners, Model-Parts and Layers:

Runner classes are classes which define a running behaviour, like "to train a model" or "to evaluate a model". They contain the full logic of training or evaluation, including the training loop an parsing settings from the settings file.
Model (Parts) classes define partly a model. You can image a model like a group of building blocks which are plugged together to a full model. Those classes define the network architecture, the data-reader or the behaviour of the optimization algorithm. Depending on the chosen runner-class, you might need different model parts. For example an evaluation runner does not need the optimization model-part.
Layer classes define a certain layer for the network. They are optional classes, which needn't to be used for a network. Instead you can also use plain tensorflow. However those classes help to build up a network and to write the architecture of the network to a log-file.

The following image shows the relationship of those class-types:

Class Types

Runner-Classes

Runner-Class Objects are created by the main-model class CModel which must be instantiated in every application beforehand. When creating a main-model, a network model-part must be assigned to it:

import deep_learning as dl
import deep_driving.model as model

Model = dl.CModel(model.CAlexNet)

Now the object Model can create one of the 4 different runner-class objects available:

Trainer-Runner

The trainer-runner (deep_learning.train.CTrainer) trains a model with the given training data. When creating the trainer, the data-reader model-part and error-measurement model-part are necessary. Furthermore a printer model-part and summary-merger model-part can optionally be assigned to it.

Trainer = Model.createTrainer(model.CTrainer, model.CReader, model.CError, Settings)

# optional...
Trainer.addPrinter(model.CPrinter())
Trainer.addSummaryMerger(model.CMerger())

After creating the trainer, a checkpoint can be restored:

# restores the last stored checkpoint
Trainer.restore()

# restores the checkpoint for epoch number 3
Trainer.restore(3)

The directory, where the checkpoints are stored, is read from the Settings dictionary. If you want to train a new model from scratch, no restore-command is necessary. Thus it is an optional operation.

To train the model, simply call the train method:

Trainer.train()

All settings, like the size of an epoch or the number of epochs are read from the Settings dictionary.

The class deep_learning.train.CTrainer is just a base class, which contains the skeleton of a trainer. For a certain application this class must be inherited and some methods need to be overwritten with the application specific behavior:

def _createOptimizer(self, ErrorMeasurement, Settings): This method must create and return the optimizer object. The cost function (or loss) is the output of ErrorMeasurement.getOutputs().
def _trainIteration(self, Session, RunTargets, Reader, Iteration, Batch, Epoch): This method performs a single training iteration. If nothing special is done here the following code should do the job in many cases:

  def _trainIteration(self, Session, RunTargets, Reader, Iteration, Batch, Epoch):
    Data = Reader.readBatch(Session)
    AllTargets = RunTargets
    return Session.run(AllTargets, feed_dict = Data)

Optionally the following method can be overwritten:

def _getGradientNoise(self, Settings): It returns the standard deviation of the gradient noise, which should be used during optimization. If this method is not overwritten, it will simply return 0, which means no noise is applied. Normally a method like this should do the job:

  def _getGradientNoise(self, Settings):
    if "Optimizer" in Settings:
      if "Noise" in Settings["Optimizer"]:
        return Settings["Optimizer"]["Noise"]

    return None

The Settings dictionary contains all settings of an application. The following settings are used for the trainer:

Trainer: General Settings for the trainer behaviour.
- CheckpointEpochs: Defines after which number of epochs a checkpoint is created. Creating an checkpoint takes some time, thus you won't do it for every epoch. But waiting too long for checkpoint creation means that you might loose training results, if the script or your computer crashes.
- CheckpointPath: The path were to store the checkpoints. It can be a relative or absolute path.
- EpochSize: The number of samples for a single epoch. The number of iterations per epoch is this value divided by the batch-size.
- NumberOfEpochs: The number of epochs to train.
- SummaryPath: The path where to store the training summary (tensorboard-files).
Optimizer: Settings which defines the behaviour of the optimizer:
- LearnRateDecay: The decay of the learning rate. A number of 0.1 means, that the training rate is after reduction only 10% of the learning rate before.
- EpochsPerDecay: The number of epochs before the learning rate is reduced.
- StartingLearningRate: The learning rate of the first epoch.
- WeightDecay: The weight decay that should be applied to the loss-function.
- Momentum: The momentum which should be applied to the optimizer.
- Noise: The standard deviation of the gradient noise, which should be added to the gradients. If this value is 0 or null, no noise is added. Adding noise can help in difficult recurrent networks to speed up training. See this paper for more details.
Runner: Settings which change the behavior of all kinds of runners:
- Memory: Defines the maximum percentage of memory, which should be allocated by tensorflow. If null is given, tensorflow will allocate the whole available memory. This option is normally not needed for normal application. Only when running several applications on the same graphic card it makes sense.
Validation: Settings important for validation during training.
- Samples: The number of samples which are used from the validation data-set for validation. The validation is performed after each iteration with the same batch-size like the training.

Evaluation-Runner

The evaluation runner (deep_learning..evaluator.CEvaluator) evaluates a model with the given test-data set. When creating an evaluator runner, the data-reader and error-measurement model-parts must be defined. Furthermore printer and summary-merger model-parts can optionally be assigned to it.

Evaluator = Model.createEvaluator(model.CEvaluator, model.CReader, model.CError, Settings)

# optional assignments
Evaluator.addPrinter(model.CPrinter())
Evaluator.addSummaryMerger(model.CMerger())

After creating an evaluator, a checkpoint must be restored. While this action is optional, it does not make much sense to evaluate a model without using a checkpoint:

# restore the latest checkpoint
Evaluator.restore()

# restore the checkpoint for epoch 3
Evaluator.restore(3)

Also for the evaluator runner, all necessary information (like the path where the checkpoints are stored) is taken from the Settings dictionary.

After restoring a checkpoint, the model can be evaluated with:

Error = Evaluator.eval()
print("Mean Absolute Error: {:.2f}".format(Error))

The error-value, which is returned by the eval() method, is defined in the error-measurement model-part. It is application specific.

The evaluator runner stores the summary of the full evaluation run (the summaries of each individual evaluation run is merged with the summary-merger model-part). Thus you can print the whole summary after the evaluation:

Summary = Evaluator.getSummary()
Evaluator.getPrinter().printFullSummary(Summary)

Furthermore, you can store the results into a log file. This file will also contain the architecture of the network and all settings:

Evaluator.storeResults('result.txt')

The class deep_learning..evaluator.CEvaluator is just a base class, which needs to be inherited by the application, to define the application specific behavior. However, in comparison to the trainer, only a single and simple method needs to be overwritten:

  def _evalIteration(self, Session, RunTargets, Reader, Iteration, Batch, Epoch):
    Data = Reader.readBatch(Session)
    AllTargets = RunTargets
    return Session.run(AllTargets, feed_dict = Data)

In normal cases, this method should do the job for many applications.

The Settings dictionary contains all settings of an application. The following settings are used for the evaluator:

Evaluator: General Settings for the trainer behavior.
- CheckpointPath: The path were to store the checkpoints. It can be a relative or absolute path.
- EpochSize: The number of samples for a single epoch.
- NumberOfEpochs: The number of epochs to evaluation.
Runner: Settings which changes the behavior of all runners:
- Memory: Defines the maximum percentage of memory, which should be allocated by tensorflow. If null is given, tensorflow will allocate the whole available memory. This option is normally not needed for normal application. Only when running several applications on the same graphic card it makes sense.

Inference-Runner

The inference runner (deep_learning.inference.CInference) can be used to calculate the output of a neural network inside an application. It is very similar to the evaluation runner, except of not running pre-defined data (the testing-dataset), but new data which is provided by the application itself. When the inference-runner is created, only a data-reader model-part must be specified:

Inference = Model.createInference(model.CInference, model.CInferenceReader, Settings)

Afterwards a checkpoint must be restored:

# restores the last checkpoint
Inference.restore()

# restores the checkpoint for epoch 3
Inference.restore(3)

After loading a checkpoint, the model can be run by calling the run(...) method. This method needs a list of inputs for the current run. This list is passed to the data-reader. Thus the meaning of the list and each element in it is application specific:

Result = Inference.run([Image])

Also the meaning of the returned result is application specific and must be defined within the inherited inference class.

By default, the inference class is logging the run-time of every run. It can be read with the getLastTime() method. With getMeanTime() the mean run-time for all runs can be obtained. It is important to know, that the first runs are very slow, since tensorflow will allocate memory and prepare the model. Thus for runtime-measurement you should not take the first 10 run-steps into account. The mean-runtime calculation automatically ignores those first steps.

The class deep_learning.inference.CInference is just the base class of the inference runner, which must be inherited to add the application specific code. Here the following methods need to be overwritten:

_runIteration(self, Session, RunTargets, Inputs, Reader, Iteration): As for every other runner, this is the method, where a single inference is processed. In many applications it can be very simple:

  def _runIteration(self, Session, RunTargets, Inputs, Reader, Iteration):
    Data = Reader.readSingle(Session, Inputs)
    AllTargets = self._Network.getOutputs()['Output']
    return Session.run(AllTargets, feed_dict=Data)

_postProcess(self, Results): This method performs a post-processing of the calculated data. If no post-processing is necessary, it can simply return the Results list. Otherwise you can implement it here.

Like all other runners, also the inference-runner needs some settings which are defined in the Settings dictionary:

Inference: Settings for the inference-runner.
- CheckpointPath: The checkpoint path.
- Epoch: The checkpoint epoch to restore. This value is only used, if the restore(...) method is called without epoch-number. Use value null to always take the last epoch.

Mean-Calculation-Runner

The mean-calculator-runner (deep_learning.calculator.CMeanCalculator) is a very specific runner, which builds just a wrapper around the data-reader to read all data and perform a calculation with it. It does not use the network or the error-measurement model-parts. In deep-driving it is used to calculate the mean-image from the whole training data-set.

Since it only uses the data-reader and no other model-part, it can be created directly without using the model-class:

Calculator = model.CMeanCalculator(model.CReader, Settings)

Afterwards you can calculate and store the mean-image information:

Calculator.calculate()
Calculator.store()

The base-class deep_learning.calculator.CMeanCalculator must be inherited to add application specific code by overwriting the following methods:

_getImage(self, Reader): Returns an image-tensor from the reader.
_calculationIteration(self, Session, RunTargets, Reader, Iteration, Batch, Epoch): Performs a single calculation step. Normally this method is very simple:

  def _calculationIteration(self, Session, RunTargets, Reader, Iteration, Batch, Epoch):
    Data = Reader.readBatch(Session)
    AllTargets = RunTargets
    return Session.run(AllTargets, feed_dict = Data)

The settings are given by the Settings dictionary:

MeanCalculator: Settings of the mean-calculator.
- EpochSize: The size of a single calculation epoch.
- NumberOfEpochs: The number of epochs, used for calculation.
- MeanFile: The name of the mean-file.

Model-Classes

A deep-learning model is defined by the following classes:

Network

The class deep_learning.network.CNetwork represents the neural network. This base-class must be inherited and the following methods must be overwritten:

_build(self, Inputs, Settings): Builds up the network structure. This method gets the Inputs from the data-reader and uses tensorflow computational graph operations to build the network. This method must return a structure dictionary which contains all important nodes of the network (at least the output node).
_getOutputs(self, Structure): This method returns the output note from the network structure dictionary. For example:

  def _getOutputs(self, Structure):
    return {'Output': Structure['Output']}

For describing the network, also the layer classes can be used.

Data-Reader

The class deep_learning.data.CReader represents a model-part, which reads the input-data and presents it to the network. Furthermore this model-part also performs the pre-processing of the input.

This base class must be inherited and the following methods must be overwritten:

_build(self, Settings): This method builds the data-reader graph and returns a list of inputs.
_getOutputs(self, Inputs): This method returns all output nodes of the data-reader graph (the input of the network).
_readBatch(self, Session, Inputs): This method returns a dictionary, which is fed to the run operation of the session. Here you can pass the input to the network (if not using a single graph for data-reader and network) or you can pass runtime constants for the following run (for example the "IsTraining" boolean).
_getBatchSize(self, Settings): Returns the number of samples per batch.
_addSummaries(self, Inputs): Adds input tensors to the tensorboard summary.
_buildPreprocessing(self, Settings, Inputs, UseDataAugmentation): Performs the pre-processing of all input tensors (before building at batch).

In general the data-reader is one of the most complex model-parts, since it is highly application specific.

Error-Measurement

The class dl.error.CMeasurement defines the loss-function and error-measurement for the application.

The base class must be inherited and the following methods need to be overwritten:

_build(self, Network, Reader, Settings): Build the graph for the loss- and error-function. It must return a structure, which contains both graph-nodes.
_getOutputs(self, Structure): Outputs the loss- and error-function as structure. Normally a trivial implementation is sufficient:

  def _getOutputs(self, Structure):
    return Structure

_getEvalError(self, Structure): Outputs the graph-node for the error-function.

Printer

The class deep_learning.printer.CProgressPrinter is an optional model-part which prints every epoch the current state of training/evaluation. You can either use the base-class to get a very basic and generic output or inherit it to customize the output to your application needs.

Summary-Merger

Also the class deep_learning.summary.CMerger is an optional model-part. It helps to merge different summary values into a single value. This is an application specific task. For example in deep driving the error values are represented as mean absolute error and standard deviation. So the summary merger in deep-driving will calculate a new mean absolute error, from the errors of different runs and it is also able to calculate an overall standard deviation from the standard deviations of each individual run.

To implement an application specific merger, the base class must be inherited and the following method must be overwritten:

_mergeSummaries(self, Summaries, SummaryTool): It merges different Summaries, given as Summary-String, into a single Summary-String. Summaries is a list of summary-strings, which can be parsed with SummaryTools.ParseFromString(...). Afterwards SummaryTools acts as a dictionary of Summary-Keys (name of the summary value) and the corresponding values. After changing this dictionary (merging the values from all string into a single dictionary), it can be serialized to a string with Result = SummaryTools.SerializePartialToString(). The serialization result must be returned by this method.

Layer-Classes

The Layer-Base Class

Every layer class is inherited from CLayer as base. This base class consists of three important methods:

apply(self, Input): Applies the layer with the given input. It returns the tensorflow operations, which represents this layer and all sub-layers (if available). Before the layer is applied a copy of the layer settings is generated, which is used to create the tensorflow operation. In this way you can change the settings of the layer-object later without changing the behavior of the already applied operations.
copy(self): Creates an exact copy of the layer-object, sub-objects and all settings.
__call__(self, ...): When a layer-object is called with the () operator, a copy of this layer is returned. Furthermore you can specify the same arguments like those, used for the constructor to change the settings of the copy while creating it. Thus the () operator works exactly like the constructor.

The Sequence-Layer

One of the most important basic-layer is the sequence layer deep_learning.struct.CSequence. It represents an ordered sequence of sub-layers (for example: "Dense -> BN -> ReLU").

When creating a CSequence-Layer the following arguments can be passed to the constructor:

Name: The name of the layer-sequence. Can be None to use no name.
Layers: A list of layers, which should be part of the sequence. Or a single layer, which should be the first part. Can be None to start with an empty sequence.
DefaultLayer: The index of the layer in the sequence, which should represent the whole sequence to the application. All method-calls, which does not belong to the sequence itself are passed to this default layer for convenience. Thus the whole sequence behaves like a single layer of the type of the default layer. If you have for example the sequence "Dense -> BN -> ReLU" you can define the Dense-Layer as default layer (index 0) and thus the whole sequence can be used like a dense-layer. If you specify None, no default layer is used.

After creating a sequence, you can add new layers to it:

add(self, Layer): Adds a single layer to the sequence.
addLayers(self, Layers): Adds a list of layers to the sequence.

Furthermore you can add a group name for all upcoming layers with addLayerName(Name, UseCounter=True). All further layers will belong to a layer-group of the this name in tensorboard. If you set UseCounter to True, a number is added to each layer-group which increases automatically with each new group. In this way you can achieve group names like "Conv_1", "Conv_2", "Dense_3". This method can also be used in a python with statement like shown in the following code-example:

Seq = dl.layer.Sequence("Network")

with Seq.addLayerName("Conv"):
  Layer = Seq.add(dl.layer.Conv2D_BN_ReLU(3, 128, Name="C1"))
  Layer = Seq.add(dl.layer.Conv2D_BN_ReLU(3, 128, Name="C2"))
  Layer = Seq.add(dl.layer.MaxPooling(3, 2))

with Seq.addLayerName("Dense"):
  Layer = Seq.add(dl.layer.Dense_BN_ReLU(1024))
  Layer = Seq.add(dl.layer.Dropout(0.5))

with Seq.addLayerName("Dense"):
  Layer = Seq.add(dl.layer.Dense(OutputNodes))

Output = Seq.apply(Input)

The () operator of a sequence is a little bit special, compared to the standard behavior: It creates a copy of the sequence, but the arguments are passed to the default-layer. Thus from the argument point of view it behaves like a constructor of the default-layer, while it reality it creates a copy of the whole sequence.

Dense-Layer

A deep_learning.layer.dense.CDense-layer is simply a fully connected layer without activation function. The constructor has the following attributes:

Nodes: The number of output-nodes for the dense-layer.
Name: The name of this layer ("Dense" is the default name).

Furthermore the following methods can be used to setup the dense layer:

setWeightLR(self, LR): Sets the learning rate factor of the weights (1.0 is default).
setBiasLR(self, LR): Sets the learning rate factor of the bias (1.0 is default).
setWeightDecay(self, Decay): Sets the weight decay factor of the weights (1.0 is default).
setBiasDecay(self, Decay): Sets the weight decay factor of the bias (0.0 is default).
setWeightInit(self, Init): Sets the initialization scheme of the weights (Xavier initialization is default).
setBiasInit(self, Init): Sets the initialization scheme of the bias (constant 0 initialization is default).
setUseBias(self, IsUsed): Enables or disables the usage of biases (enable is default).
setNodes(self, Nodes): Sets the number of output-nodes.

Conv2D-Layer

The class deep_learning.layer.conv.CConv2D represents a convolutional layer without activation function. The constructor has the following attributes:

Kernel: The size of the kernel. This is either a list of values, which represents the size in x and y direction or a single value, if the size in x and y direction is the same.
Filters: The number of filters.
Stride: The stride of the kernel window, which should be used. Default is 1.
Padding: The type of padding to use. Default is "SAME". If padding is a number, it specifies the number of pixels around the input-map which should be added by "SAME" padding. Thus the value "SAME" and 1 results in the same padding.
Groups: The number of filter groups to use. Default is 1. The AlexNet uses on some layers 2 groups.
Name: The name of the layer. Default is "Conv2D".

After creating a layer object, there are several methods to setup the layer further:

setKernel(self, Kernel): Sets the size of the kernel. This is either a list of values, which represents the size in x and y direction or a single value, if the size in x and y direction is the same.
setFilters(self, Filters): Sets the number of filters.
setStride(self, Stride): Sets the stride of the kernel window.
setPadding(self, Padding): Sets the padding to use. If padding is a number, it specifies the number of pixels around the input-map which should be added by "SAME" padding. Thus the value "SAME" and 1 results in the same padding.
setGroups(self, Groups): Sets the number of groups for the filters.
setKernelLR(self, LR): Sets the learning rate factor of the kernel weights (1.0 is default).
setBiasLR(self, LR): Sets the learning rate factor of the bias (1.0 is default).
setKernelDecay(self, Decay): Sets the weight decay factor for the kernel (1.0 is default).
setBiasDecay(self, Decay): Sets the weight decay factor for the bias (0.0 is default).
setKernelInit(self, Init): Sets the initialization mode of the kernel (Xavier initialization is default).
setBiasInit(self, Init): Sets the initialization mode of the bias (constant 0 initialization is default).
setUseBias(self, UseBias): Enables or disables the usage of bias (enable is default).

Activation-Layer

The class deep_learning.layer.activation.CActivation represents an activation-function layer. The constructor accepts two arguments:

Func: Is the tensorflow-operation which should be performed as activation-function.
Name: Is the name of the layer.

For convenience there are activation layer templates available:

deep_learning.layer.activation.ReLU: Represents a ReLU activation function.
deep_learning.layer.activation.Sigmoid: Represents a sigmoid activation function.

Pooling-Layer

The class deep_learning.layer.conv.CPooling represents a Pooling-Layer. The constructor accepts the following arguments:

Window: The window-size for the pooling. This is either a list of values, where the first value represents the window size in x direction and the second value the size in y direction. Or it is a single number, if both directions of the window have the same size.
Stride: The stride value for the pooling window.
Type: Is either "MAX" for max-pooling or "AVG" for average-pooling.
Padding: Is the padding type or padding size. The padding type is either "SAME" or "VALID". If a number is given, it represents the number of padding pixels with "SAME" padding. Thus the values "SAME" and 1 are equal to each other. "SAME" is the default value.
Name: Is the name of the layer. Default is "Padding".

After creating the pooling layer it can be changed by using the following methods:

setWindow(self, Window): Sets the window size. This is either a list of values, where the first value represents the window size in x direction and the second value the size in y direction. Or it is a single number, if both directions of the window have the same size.
setStride(self, Stride): Sets the stride value of the window.
setType(self, Type): Sets the type of pooling. This is either "MAX" or "AVG".
setPadding(self, Padding): Sets the padding of the pooling layer. The padding type is either "SAME" or "VALID". If a number is given, it represents the number of padding pixels with "SAME" padding. Thus the values "SAME" and 1 are equal to each other. "SAME" is the default value.

Dropout-Layer

The class deep_learning.layer.dense.CDropout represents a dropout-layer. The constructor accepts the following arguments:

KeepRatio: The percentage of nodes to be kept.
Name: The name of the dropout-layer. Default is "Dropout".

In order to work correctly, the function deep_learning.layer.Setup.setupIsTraining(...) needs to be called before a dropout layer is applied. This function expects a boolean tensor as input, which specifies if the layer is in training or inference mode.

Batch-Normalization-Layer

The class deep_learning.layer.dense.CBatchNormalization represents a batch-normalization layer. The constructor does only expect the Name of this layer.

Also for this layer the function deep_learning.layer.Setup.setupIsTraining(...) needs to be called before a batch-normalization is applied. This function expects a boolean tensor as input, which specifies if the layer is in training or inference mode.

Log Feature-Map Layer

The class deep_learning.conv.CLogFeatureMap is a helper layer, which simply adds the feature map to the image output of tensorboard. The constructor does only expect a name.

Templates

For convenience, often used layer combinations are available as template. The following layer-sequence templates can be used:

deep_learning.layer.Dense_BN_ReLU: Is a sequence of a Dense, Batch-Normalization and ReLU Layer. It behaves like a normal Dense layer.
deep_learning.layer.Conv2D_BN_ReLU: Is a sequence of a 2D Convolution, Batch-Normalization and ReLU Layer. It behaves like a normal Conv2D layer.

Next Step

Deep-Learning Framework: Cifar-10 Example

Wiki

DeepDriving / DeepLearningFramework