# Overview

Atlassian Sourcetree is a free Git and Mercurial client for Windows.

Atlassian Sourcetree is a free Git and Mercurial client for Mac.

# Restricted Boltzmann Machine Matlab Toolbox

## Introduction

This is a toolbox written in MATLAB to train **Restricted Boltzmann Machine**. The toolbox is written using classes, putting particular emphasis to modularity and scalability of the code.

Currently, you can learn the following RBM variations:

- Vanilla RBM [1]
- GBRBM [2]
- ERI-RBM [3]
- ERI-GBRBM [2,4]

The toolbox implements the following features:

- Momentum
- V2 Sparsity [6]
- Adaptive learning rate [5]
- Class for represent Data
- Different Data Preprocessing
- Different ways of initialise sigma for GBRBM
- CUDA computing

## How to use

Suppose that you have your data (e.g. MNIST dataset) in a matrix 10000x784 that we call *Data*. You can get a copy of the original dataset here. Let's assume that you want to learn a latent representation of *H=500*.

### Restricted Boltzmann Machine

You will learn how to train a Vanilla RBM and some important variation (e.g., add sparsity term). Most of the things you will learn here are applicable to also other kind of RBMs that this library supports.

### Quick Start

rbm = RestrictedBoltzmannMachine(748,500); rbm.Train(Data);

### Verbose

If you want to be informed along the training process, you can set this parameter before the training.

rbm.TrainingEventListener.Verbose = 1; %Default 0

If you want also see some detailed information about the learning process (e.g., reconstruction error, sparsity), you might raise the debug level

rbm.Debug = 1; %Default 0

### Training Event Listener

Inspired from Java, this class reacts to some event during train. Specifically, it is triggered when
* A sub-epoch ends
* An epoch ends
* Stop criterion

You can extend this the class *DefaultTrainingEvent* (or *AbstractTrainingEvent*) to define your customised events. The class *DefaultTrainingEvent* shows some relevant information and stops after a predefined number of epochs. To change this value, you can launch the following command

rbm.TrainingEventListener.MaxIterations = 150; %Defaul 100

### Learning Rate

The learning rate is also managed by a group of classes inside the package LearningRate. To change value of the learning rate, you may launch this command:

rbm.Eta.Value = 0.01; %Default 0.00001

The default learning rate is the ConstantLearningRate class. If you want to have the Adaptive Learning Rate [2], you should do:

rbm.Eta = LearningRate.AdaptiveLearningRate(rbm); rbm.Eta.Value = 0.001; %or any value you like

If you want to change candidates, you can do it here:

rbm.Eta.AdaptiveLearningRateCandidates = [0.9,1.1]; %Default [0.99,1.01]

Besides those candidates, also the current value of the learning rate is also considered.

You can also set some bounds (such that the Eta does go below or above certain thresholds). Please see the class property *Bounds*.

### Data Normalisation

In many applications, it is important to normalise data before providing them to an RBM. In order to do so, you need to embed your data with a Data class.

D = Data.DefaultData(Data); D.Preprocessor = DataPreprocessing.ZscorePreprocessor;

In this case, we are creating a new data structure which data will be normalised using the Z-Score (subtracting the mean and dividing by the standard deviation).

If you have test data, you should copy the same preprocessor of the training class, not creating a new one. This is because many data preprocessors learn parameters from training data, which need to be used in testing as well.

D_testing = Data.DefaultData(TestingData); D_testing.Preprocessor = D.Preprocessor;

The actual training can be done using

rbm.Train(D);

You can explicit call the method *preprocess* to start data preprocessing. However, this is implicitly done by the trianing method in RBM.

Current data preprocessor:

- ZscorePreprocessor
- Normalisation01: input data are normalised between the range [0-1]
- WhiteningPreprocessor
- ICAPreprocessor

### V2 Sparsity Regulariser

You can also have sparsity term in an RBM [6]. The V2 sparsity regulariser takes two parameters:

*p*: sparsity target (how much sparse your data should be)*lambda*: sparsity learning rate (how much you want to enforce the sparsity).

Supposing that you want to impose a sparsity target of 10% (0.1) and a learning rate 0.001, you simply need to specify it as follows:

### Momentum

To speed-up the Stochastic Gradient Descent, the use of momentum can be advised. Momentum is represented by a group of class within the package Momentum. Currently, just two kind of momentum are implemented

- ConstantMomentum: a costant value of momentum (0.5 default) is used throughtout the training process
- PiecewiseMomentum: after a certain epoch(s), the momentum changes value.

In general, to specifty a constant momentum value, you simply need to do:

rbm.Momentum.Value = 0.9;

If you want to change momentum at different epoch, maybe the PiecewiseMomentum class fits better your needs. Specifically, a typical setup would be to have the momentum at 0.5 for the first 5 epochs, then 0.9. You can setup this values as follows:

rbm.Momentum = Momentum.PiecewiseMomentum([0.5 0.9],[1,6]);

which means that, from epoch 1 to 5, the value of the momentum is 0.5, after epoch 6 will be 0.9.

rbm.RegularisationTerm = Regularisation.V2SparseRegularisationTerm(0.001,0.1);

### Inference

To extract features from your data, you simply need to do

H = rbm.extractFeatures(D);

Then, you can provide H to your classifier/regressor.

### Save

You can save the learned model in either way:

save('yourfile.mat','rbm'); rbm.save('yourfile.mat');

If you use the *save* method, the variable name of your RBM will be renamed as *model*.

If you want to store a compact version of your learnt model, with the essential information, you can use the method *dump*.

rbm.dump('yourfile.mat');

In this case, to reload the dump, you need to do the following

rbm = RestrictedBoltzmannMachine(748,500); %or another model (e.g. GB) and make sure that the number of visible and hidden units matches with the dumped model load('yourfile.mat'); rbm.loadDump(model);

### CUDA

If you have a CUDA-compatible video card and NVIDIA driver correctly configured on your operating system, you can use CUDA computing by simply setting the property *Cuda* to true.

rbm.Cuda = true; %default false

### Gaussian Bernoulli Restricted Boltzmann Machine

This library implements the GB-RBM according to [2]. Everything that has been said before will apply also for this formulation. To create a new GB-RBM, you just need to do the following:

rbm = GaussianBernoulliRestrictedBoltzmannMachine(748,500);

However, this formulation has a further parameter, sigma, that needs to be initialised and learned. In our implementation, we do not store this parameter as it is, but we keep *z = log(sigma)* (c.f. [2] for further details). Furthermore, sigma is a vector as big as the number of visible units. The ways to initialise sigma are managed within the package SigmaInitialiser. So far, two methods to initialise sigma are implemented:

- DefaultSigmaInitialiser: sigma is initialised with all 1 (therefore, z will be 0, because log(1)=0) [default]
- KMeansInitialiser: abstract class for initialisation via kmeans clusters.

We use to implementations of KMeans

- InternalKMeansInitialiser: MATLAB Kmeans will be used
- SigmaInitialiser.LiteKMeansInitialiser: Lite KMeans (more efficient than the MATLAB implementation, we advise to use this one)

We empirically found that the Kmeans initialisation gives better results. In order to use it, you need to do

rbm.SigmaInitialiser = SigmaInitialiser.LiteKMeansInitialiser;

We advise keeping the number of clusters as the number of hidden units. For example:

rbm.SigmaInitaliser.Clusters = rbm.numberOfHiddenUnits;

In this case, it would create 500 clusters (therefore, 500 different values of sigma) that they will be assigned to each visible units. The default behaviour would take an average of all these values. If you want all the sigma values, please set the following parameter to *false*:

rbm.SigmaInitialiser.SingleSigma=false;

### Learning Sigma

The sigma parameter can be learnt and by default the GB-RBM learns it. However, sometimes a different learning rate of this parameter could provide better results (and covergence of the machine). Our implementation allows to specify a multiplier value that will be applied to the current eta (since Eta could also change over time [5]). Specifically, we update sigma using the following formula:

You can set the parameter M as follows:

rbm.EtaSigmaMultiplier = 0.1 %default 1 (no effect)

Similarly to the Adaptive Learning, you can set bounds for sigma. Bounds are encoded as sigma (not as log(sigma)) to make it simpler to the programmer to set it.

rbm.SigmaBounds.Upper = 2; %default Inf rbm.SigmaBounds.Lower = 0.0001; %Default -Inf

It is advisable to set this bounds to avoud the machine to diverge.

### Citations

[1] Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural computation 14(8), 1771–1800 (2002)

[2] K. Cho, A. Ilin, and T. Raiko, “Improved learning of Gaussian-Bernoulli restricted Boltzmann machines,” *Lect. Notes Comput. Sci. (including Subser. Lect. Notes Artif. Intell. Lect. Notes Bioinformatics)*, vol. 6791 LNCS, no. PART 1, pp. 10–17, 2011

[3] M. V. Giuffrida and S. A. Tsaftaris, “Rotation-Invariant Restricted Boltzmann Machine Using Shared Gradient Filters,” in *Artificial Neural Networks and Machine Learning -- ICANN 2016: 25th International Conference on Artificial Neural Networks, Barcelona, Spain, September 6-9, 2016, Proceedings, Part II*, A. E. P. Villa, P. Masulli, and A. J. Pons Rivero, Eds. Cham: Springer International Publishing, 2016, pp. 480–488.

[4] M. V. Giuffrida and S. A. Tsaftaris, “Theta-RBM: Unfactored Gated Restricted Boltzmann Machine for Rotation-Invariant Representations,” arXiv, Jun. 2016.

[5] Cho,K.: Improved Learning Algorithms for Restricted Boltzmann Machines.Master’s thesis, Aalto University School of Science (2011)

[6] H. Lee, C. Ekanadham, and A. Y. Ng, “Sparse deep belief net model for visual area V2,” Adv. Neural Inf. Process. Syst. 20 (NIPS 2007), pp. 873–880, 2008.