Results

MNIST - 6000 Training-Samples / 10000 Test-Samples (Multi-Task: Autoencoder+Classifier)

Network	Config	Results	Error	Run	Comment
(Model1) Image=Per-Pixel Mean Substraction, Code=Dense(Image, 1000)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.0 AE, 1.0 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	6.66%	model1	Classifier only
(Model2) Image=Per-Pixel Mean Substraction, Code=Dense(Image, 1000)+GaussianNoise(Mean=0.0, StdDev=0.1, TrainingOnly)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.0 AE, 1.0 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	6.63%	model2	Classifier only + Hidden-Noise
(Model3) Image=Per-Pixel Mean Substraction, Code=GaussianNoise(Mean=0.0, StdDev=0.1, TrainingOnly)+Dense(Image, 1000)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.0 AE, 1.0 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	5.65%	model3	Classifier only + Input-Noise
(Model4) Image=Per-Pixel Mean Substraction, Code=GaussianNoise(Mean=0.0, StdDev=0.1, TrainingOnly)+Dense(Image, 1000)+GaussianNoise(Mean=0.0, StdDev=0.1, TrainingOnly)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.0 AE, 1.0 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	5.56%	model4	Classifier only + Input-Noise + Hidden-Noise
(Model5) Image=Per-Pixel Mean Substraction, Code=Dense(Image, 1000)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.4 AE, 0.6 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	6.90%	model5	Classifier + AE
(Model6) Image=Per-Pixel Mean Substraction, Code=Dense(Image, 1000)+GaussianNoise(Mean=0.0, StdDev=0.1, TrainingOnly)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.4 AE, 0.6 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	6.83%	model5	Classifier + AE + Hidden Noise
(Model7) Image=Per-Pixel Mean Substraction, Code=GaussianNoise(Mean=0.0, StdDev=0.1, TrainingOnly)+Dense(Image, 1000)+Sigmoid, Output=Dense(Code, 28x28x1), Class=Dense(Code, 10)+Softmax	0 Autoencoder-States + 1 Classifier-States (Loss: 0.4 AE, 0.6 Class), LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 120 Epochs	result_classification.txt	5.67%	model5	Classifier + AE + InputNoise

MNIST - reduced dataset-size (Autoencoder)

Network	Config	Results	Error	Run	Comment
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	0 Autoencoder-States + 1 Classifier-States, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 150 Epochs	result_classification.txt	1.40%	reduced_dataset/0_ae_1_classifier	Time: 2:01h
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	1 Autoencoder-States (without noise) + 1 Classifier-States, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_reconstruction.txt result_classification.txt	1.42%	reduced_dataset/1_ae_1_classifier	Time: 3:30h
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	1 Autoencoder-States (with noise) + 1 Classifier-States, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_reconstruction.txt result_classification.txt	1.35%	1_ae_noise_1_classifier	Time: 3:40h

MNIST (Autoencoder)

Network	Config	Results	Error	Run	Comment
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	0 Autoencoder-States + 1 Classifier-States, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_classification.txt	0.76%	0-ae-states_1-classifier-state	Time: 1:30h
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	4 Autoencoder-States + 2 Classifier-States using target-loss only, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_reconstruction.txt result_classification.txt	1.52%	4-ae-states_2-classifier-states_target-loss	Time: 2:30h
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	1 Autoencoder-State + 2 Classifier-States using target-loss only, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_reconstruction.txt result_classification.txt	1.17%	1-ae-state_2-classifier-states_target-loss	Time: 1:57h
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	1 Autoencoder-State (noise) + 1 Classifier-States using reconstruction-loss only, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_reconstruction.txt result_classification.txt	0.69%	1-ae-state_1-classifier-states_target-loss	Time: 2:05h
Per-Pixel Mean Substraction + 4x2 Conv. Layer + 1 Dense Layer	1 Autoencoder-State (without noise) + 1 Classifier-States using reconstruction-loss only, LR: 0.001, LR-Decay: 0.5/15 Epoch, Weight-Decay: 0, 180 Epochs	result_reconstruction.txt result_classification.txt	0.72%	1-ae-state_without_noise_1-classifier-state_recon_loss	Time: 2:05h

DeepDriving

Network	Config	Mean	SD	Results	Run	Comment
Per-Pixel-Normalization + Original-Net + Batch-Normalization + LRN	LR: 0.01; Decay: 0.90/40; WD: 0.005; 500 Epochs; Momentum-Optimizer	?	?	results.txt	run_1	Quite good results
Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.90/40; WD: 0.005; ? Epochs; Momentum-Optimizer	?	?	results.txt	run_2	same as run_1
Per-Pixel-Normalization + Original-Net + Batch-Normalization + no sigmoid output-layer	LR: 0.01; Decay: 0.90/40; WD: 0.005; ? Epochs; Momentum-Optimizer	?	?	results.txt	run_3	extreme MAE in the beginning slow convergence
Per-Pixel-Standardization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.90/40; WD: 0.005; ? Epochs; Momentum-Optimizer	?	?	results.txt	run_4	slower convergence in the beginning, I need to run longer
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.90/40; WD: 0.005; ? Epochs; Momentum-Optimizer	?	?	results.txt	run_5	Good results, but only with HUE delta 0.05. With HUE delta 0.07 or bigger, there is a strong divergence.
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.90/40; WD: 0.005; 240 Epochs; Nestrov-Momentum-Optimizer	?	?	results.txt	run_7	No changes compared to run_1
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.90/40; WD: 0.0; 240 Epochs; Nestrov-Momentum-Optimizer	?	?	results.txt	run_8	No changes compared to run_1
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 1.0; Decay: 0.90/40; WD: 0.0; ? Epochs; Ada-Delta Optimizer	?	?	results.txt	run_9	No changes compared to run_1, but weights are exploding.
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.90/40; WD: 0.0; ? Epochs; Adam-Optimizer	?	?	-	run_10	Exploding weights, strong divergence!
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.1; Decay: 0.90/40; WD: 0.0001; ? Epochs; Adam-Optimizer	?	?	-	run_11	No convergence!
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 1.0; Decay: 0.90/30; WD: 0.0001; ? Epochs; AdaDelta-Optimizer	?	?	-	run_12	good convergence, comparable to run_1
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 1.0; Decay: 0.90/30; WD: 0.0001; ? Epochs; AdaDelta-Optimizer; Noise: 1.0	?	?	-	run_13	exploding weights, divergence!
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 1.0; Decay: 0.90/30; WD: 0.0001; ? Epochs; AdaDelta-Optimizer; Noise: 0.3	?	?	-	run_14	exploding weights, bad error
Data-Augmentation + Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 1.0; Decay: 0.90/30; WD: 0.0001; ? Epochs; AdaDelta-Optimizer; Noise: 0.01	?	?	-	run_15	comparable to run_1, higher standard deviation of error
Data-Augmentation + Per-Pixel-Normalization + VGG	LR: 1.0; Decay: 0.90/30; WD: 0.0; ? Epochs; AdaDelta-Optimizer; Noise: 0.01	?	?	-	run_16	almost no convergence
Per-Pixel-Normalization + VGG	LR: 0.01; Decay: 0.90/30; WD: 0.0; ? Epochs; Adam-Optimizer; Noise: 0.01	?	?	-	run_17	almost no convergence
Per-Pixel-Normalization + VGG	LR: 0.01; Decay: 0.90/40; WD: 0.0005; ? Epochs; Adam-Optimizer; Noise: 0.01	?	?	-	run_18	almost no convergence
Per-Pixel-Normalization + VGG + Batch-Normalization	LR: 0.01; Decay: 0.10/100; WD: 0.0005; ? Epochs; Momentum-Optimizer; Noise: 0.01	?	?	-	run_19	almost no convergence
Per-Pixel-Normalization + Original-Net + Batch-Normalization	LR: 0.01; Decay: 0.50/300; WD: 0.0005; 1220 Epochs; Momentum-Optimizer; Noise: 0.0	?	?	results.txt	run_20	good convergence, comparable to run_1
Per-Pixel-Normalization + Original-Net without Dropout + Batch-Normalization	LR: 0.01; Decay: 0.50/300; WD: 0.0005; 520 Epochs; Momentum-Optimizer; Noise: 0.0	?	?	results.txt	run_21	better convergence than with drop-out, especially for training-data
Per-Pixel-Normalization + Original-Net with corrected Dropout + Batch-Normalization	LR: 0.01; Decay: 0.50/300; WD: 0.0005; 1850 Epochs; Momentum-Optimizer; Noise: 0.0	?	?	results.txt	run_22	best convergence of validation-data so far!
Per-Pixel-Normalization + Original-Net + Dropout + Batch-Normalization (no Batch-Normalization and weight-decay for output layer)	LR: 0.01; Decay: 0.50/300; WD: 0.0005; 1860 Epochs; Momentum-Optimizer; Noise: 0.0	?	?	results.txt	run_23	best performance, better than original net in many categories
Per-Pixel-Normalization + Original-Net + Dropout + Batch-Normalization (no Batch-Normalization and weight-decay for output layer) + no sigmoid at the output	LR: 0.01; Decay: 0.50/300; WD: 0.0005; ? Epochs; Momentum-Optimizer; Noise: 0.0	?	?	results.txt	run_24	very noisy validation, bad performance
Per-Pixel-Normalization + Custom-Net + Dropout + Batch-Normalization (no Batch-Normalization and weight-decay for output layer)	LR: 0.01; Decay: 0.50/300; WD: 0.0005; 2000 Epochs; Momentum-Optimizer; Noise: 0.0	16.41	16.67	results.txt	run_25	very good performance, final version

Cifar-10

Pre-Processing

Network	Config	Error	Results	Notes	Run
Tutorial-Net	LR: 0.005; Decay: 0.96/1; WD: 0; 30 Epochs	38.39%	results.txt	Very sparse activation in conv1	run_1
Tutorial-Net	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	34.30%	results.txt	Very sparse activation in conv1	run_2
Tutorial-Net + Preprocessing (-0.5)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	31.32%	results.txt	rich feature maps conv1, but sparse activation in following layers	run_3
Tutorial-Net + Preprocessing (Color-Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	32.07%	results.txt	less feature maps in layer conv1	run_4
Tutorial-Net + Preprocessing (PerPixel-Color-Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	27.68%	results.txt	rich feature map in layer conv1, and less sparsity in layer 4	run_5
Tutorial-Net + Preprocessing (PerPixel-Mean-Subtraction)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	29.48%	results.txt	Very rich feature map of conv1, but more sparse conv2 layer. Also layer 4 is more sparse.	run_6
Tutorial-Net + Preprocessing (Per-Image Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	29.54%	results.txt	Very righ feature-maps in conv1.	run_7
Tutorial-Net + Preprocessing (Per-Pixel Standardization + Per-Image Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	28.59%	results.txt	Almost very rich feature map in conv1.	run_8
Tutorial-Net + Data-Augmentation (Random cropping) + Preprocessing (Per-Pixel Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	27.52%	results.txt	Less rich feature map in conv1, slightly less sparse feature map in conv2.	run_9
Tutorial-Net + Data-Augmentation (Random cropping, Random flipping) + Preprocessing (Per-Pixel Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	28.82%	results.txt	Sparsity decreases in conv1 on the long run.	run_10
Tutorial-Net + Data-Augmentation (Random cropping, Random flipping, Random brightness) + Preprocessing (Per-Pixel Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	90.02%	results.txt	Images are extremely bright or dark.	run_11
Tutorial-Net + Preprocessing (Per-Pixel Standardization) + Data-Augmentation (Random cropping, Random flipping, Random brightness) + Preprocessing (Per-Image Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	38.17%	results.txt	Images are extremely bright or dark.	run_12
Tutorial-Net + Data-Augmentation (Random cropping, Random flipping, Random brightness) + Preprocessing (Per-Image Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	39.39%	results.txt	Images are extremely bright or dark.	run_13
Tutorial-Net + Data-Augmentation (Random brightness, contrast, saturation and HUE)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	30.52%	results.txt		run_14
Tutorial-Net + Data-Augmentation (Random brightness, contrast, saturation and HUE) + Pre-Processing (Per-Pixel Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	29.61%	results.txt	Rich feature maps in conv1.	run_15
Tutorial-Net + Data-Augmentation (Random brightness, contrast, saturation and HUE) + Pre-Processing (Per-Image Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	47.50%	results.txt	Very rich feature map in conv1 but extremely poor feature map in conv2.	run_16
Tutorial-Net + Data-Augmentation (Random-Cropping and Flipping, Random brightness, contrast, saturation and HUE) + Pre-Processing (Per-Pixel Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.004; 30 Epochs	28.50%	results.txt	Less rich feature map in conv1 but quite rich feature map in conv2.	run_17
Tutorial-Net + Data-Augmentation (Random-Cropping and Flipping, Random brightness, contrast, saturation and HUE) + Pre-Processing (Per-Pixel Standardization)	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	23.90%	results.txt	Almost sparse feature map in conv1.	run_18

Network structure

All networks use data-augmentation (Random-Cropping and Flipping, Random brightness, contrast, saturation and HUE) and pre-processing (per-pixel standardization)

Network	Config	Error	Results	Notes	Run
Tutorial-Net	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	23.29%	results.txt		run_1
Tutorial-Net + Variable-Module for Conv1	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	22.31%	results.txt		run_2
Tutorial-Net + Use own Conv2D function with Bias constant initializer of 0.1	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	90.06%	results.txt		run_3
Tutorial-Net + Use own Conv2D function with Bias constant initializer of 0.0	LR: 0.005; Decay: 0.96/10; WD: 0.0; 30 Epochs	22.25%	results.txt		run_4
Tutorial-Net + Use own Conv2D, Activation and Pool function	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	22.70%	results.txt		run_5
Tutorial-Net + Use own Conv2D, Activation, Pool and LRN function in conv1	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	23.32%	results.txt		run_6
Tutorial-Net + Use own Conv2D, Activation, Pool and LRN function for conv1 and conv2, conv2 bias is also initialized with 0.0	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	28.75%	results.txt		run_7
Tutorial-Net + Use own Conv2D, Activation, Pool and LRN function for conv1 and conv2, conv2 bias is initialized with 0.1	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	23.40%	results.txt		run_8
Tutorial-Net + Use custom Conv2D, Activation, Pool and LRN function for conv1 and conv2 + custom Fully-Connected layer 3 (with stddev=0.02)	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	25.34%	results.txt		run_9
Tutorial-Net + Use custom Conv2D, Activation, Pool and LRN function for conv1 and conv2 + custom Fully-Connected layer 3 (with stddev=0.04)	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	22.38%	results.txt		run_10
Custom-Net (2 Conv-Layer, 2 FC-Layer, 1 Dense-Output)	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	22.80%	results.txt		run_11
Custom-Net (3 Conv-Layer, 1 FC-Layer, 1 Dense-Output)	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	22.17%	results.txt		run_12
Custom-Net (3 Conv-Layer, 2 FC-Layer, 1 Dense-Output)	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	21.88%	results.txt		run_13
Custom-Net (3 Conv-Layer, 2 FC-Layer, 1 Dense-Output) + no LRN	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	23.67%	results.txt		run_14
Custom-Net (3 Conv-Layer, 2 FC-Layer, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0; 30 Epochs	19.66%	results.txt	High loss for validation in the beginning of training	run_15
Custom-Net (3 Conv-Layer, 2 FC-Layer, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/25; WD: 0.0005; 30 Epochs	27.84%	results.txt		run_16

The following runs use 60 Epochs for training.

Network	Config	Error	Results	Run
Custom-Net (3 Conv-Layer, 2 FC-Layer, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0002; 60 Epochs	14.98%	results.txt	run_17
Custom-Net (3 Conv-Layer, 2 FC-Layer, 1 Dense-Output) + no Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 60 Epochs	23.16%	results.txt	run_18
Custom-Net (2 Conv-Layer, 1 Conv-Layer + Dropout, 2 FC-Layer + Dropout, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0002; 60 Epochs	16.10%	results.txt	run_19

The following runs use 120 Epochs for training.

Network	Config	Error	Results	Run
Custom-Net (2 Conv-Layer, 1 Conv-Layer + Dropout, 2 FC-Layer + Dropout, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0002; 120 Epochs	15.34%	results.txt	run_19
Custom-Net (2 Conv-Layer, 1 Conv-Layer + no Dropout, 2 FC-Layer + no Dropout, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0002; 120 Epochs	14.56%	results.txt	run_20
Custom-Net (2 Conv-Layer, 1 Conv-Layer + no Dropout, 2 FC-Layer + no Dropout, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	13.91%	results.txt	run_21
Custom-Net (2 Conv-Layer, 1 Conv-Layer + Dropout, 2 FC-Layer + Dropout, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	15.14%	results.txt	run_22
Custom-Net (2 Conv-Layer, 1 Conv-Layer + no Dropout, 2 FC-Layer + Dropout, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	13.90%	results.txt	run_23
Custom-Net (2 Conv-Layer, 1 Conv-Layer + no Dropout, 1 FC-Layer + Dropout, 1 FC-Layer + no Dropout. 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	13,38%	results.txt	run_24
Custom-Net (3 Conv-Layer (128 Filter), 1 FC-Layer + Dropout, 1 FC-Layer, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	12,71%	results.txt	run_25
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(512) + Dropout, 1 FC-Layer, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	12.31%	results.txt	run_26
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 2 FC-Layer, 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	12.13%	results.txt	run_27
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256) + Dropout, 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	12.32%	results.txt	run_28
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + no Dropout, 1 FC-Layer (256) + no Dropout, 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	13.01%	results.txt	run_29
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	12.31%	results.txt	run_30
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + PReLU	LR: 0.005; Decay: 0.96/1; WD: 0.0000; 120 Epochs	12.37%	results.txt	run_31

The following runs where based on the tweaked meta-parameters of run_7.

Network	Config	Error	Results	Run
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Xavier-Initialization + (classes for FC layers)	LR: 0.003; Decay: 0.5/30; WD: 0.0000; 120 Epochs	11.50%	results.txt	run_32
Custom-Net (3x2 Conv-Layer (128 Filter) + reduce kernel size to 3x3, 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Xavier-Initialization + (classes for FC layers)	LR: 0.003; Decay: 0.5/30; WD: 0.0000; 120 Epochs	10.31%	results.txt	run_33
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Xavier-Initialization + (classes for all layers)	LR: 0.003; Decay: 0.5/30; WD: 0.0000; 120 Epochs	10.34%	results.txt	run_34
Custom-Net (3x2 Conv-Layer (128 Filter) + BN + ReLU, 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Xavier-Initialization	LR: 0.003; Decay: 0.5/30; WD: 0.0000; 120 Epochs	8.79%	results.txt	run_35

Optimizer-Arguments

Network	Config	Error	Results	Notes	Run
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + ReLU	LR: 0.001; Decay: 0.96/1; WD: 0.0000; 120 Epochs	11.57%	results.txt		run_1
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization	LR: 0.001; Decay: 0.5/30; WD: 0.0000; 120 Epochs	11.37%	results.txt	slight overfitting in loss	run_2
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization	LR: 0.001; Decay: 0.5/30; WD: 0.0001; 120 Epochs	11.93%	results.txt	still some overfitting	run_3
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Use Xavier-Initialization for Conv-Layer	LR: 0.001; Decay: 0.5/30; WD: 0.0000; 120 Epochs	11.14%	results.txt		run_4
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Use Xavier-Initialization	LR: 0.001; Decay: 0.5/30; WD: 0.0000; 120 Epochs	11.42%	results.txt		run_5
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Xavier-Initialization	LR: 0.002; Decay: 0.5/30; WD: 0.0000; 120 Epochs	11.09%	results.txt		run_6
Custom-Net (3x2 Conv-Layer (128 Filter), 1 FC-Layer(1024) + Dropout, 1 FC-Layer (256), 1 FC-Layer (64), 1 Dense-Output) + Batch-Normalization + Xavier-Initialization	LR: 0.003; Decay: 0.5/30; WD: 0.0000; 120 Epochs	10.91%	results.txt		run_7

Wiki

DeepDriving / Results

Results

MNIST - 6000 Training-Samples / 10000 Test-Samples (Multi-Task: Autoencoder+Classifier)

MNIST - reduced dataset-size (Autoencoder)

MNIST (Autoencoder)

DeepDriving

Cifar-10

Pre-Processing

Network structure

Optimizer-Arguments