Parallel and Distributed Training of Neural Networks (MATLAB/Python)

This code is a general library implementing parallel and distributed
algorithms for training neural networks, based on the framework of
successive convex approximations (SCA, see [1-3]). 

This can be used to train a neural network whenever training data is
distributed over a network of interconnected agents, following an iterative
two-step process:

    1) Optimization: each agent solves a strongly convex approximation of
        its own (non-convex) training problem. This step can eventually
        be parallelized up to one weight per processor.
    2) Consensus: information is exchanged over the network via two local
        consensus steps.

The framework is described in the following paper:

    Scardapane S. & Di Lorenzo, P. (2017). "A framework for parallel and 
    distributed training of neural networks". Neural Networks, in press.

A preprint can be found at:

Organization (MATLAB)
Most of the code is contained in the "classes" folder. The basic classes are:

	* MultilayerPerceptron.m: a standard NN with a single hidden layer.
	* LearningAlgorithm.m: an abstract class for defining training procedures.
	* DistributedAlgorithms/NextMLP.m: abstract class for defining distributed
	(possibly parallel) algorithms based on the SCA framework.

Four implementations of the NextMLP framework are provided:

    * L2_NextMLP.m: squared loss and l2 regularization on weights. The surrogate
        function is defined by linearizing only the neural network model and
        keeping fixed the rest of the cost function (see Sec. 4.2 in the paper).
    * Lin_L2_NextMLP.m: same cost function as before, but the surrogate is 
        obtained by linearizing the overall error function. This is slower,
        but the optimum is obtained without need of computing an inverse (again,
        see Sec. 4.2 in the paper).
    * L1_NextMLP.m: squared loss and l1 regularization to impose sparsity,
        while the surrogate is obtained by partial linearization. The resulting
        l1-minimization problem is solved with an ad-hoc library contained
        in the "functions/L1General" folder (see Sec. 4.3a in the paper).
    * Lin_L1_NextMLP.m: squared loss and l1 regularization, with complete
        linearization of the error function. The optimum can be expressed
        in closed form with the use of soft-thresholding (again, see Sec. 4.3a 
        in the paper).

Two additional algorithms are provided as comparison in the centralized case:	

    * CentralizedAlgorithms/VanillaMLP.m: basic stochastic gradient descent 
        with backpropagation.
    * CentralizedAlgorithms/MatlabMLP.m: a wrapper to the training functions
        in the Neural Networks toolbox of MATLAB.

Additionally, the library provides some baselines training algorithms, and some
utility functions to split the dataset and initialize the network of agents.

Usage (MATLAB)
To launch a simulation, simply use the script 'test_script.m'. All the
configuration parameters are specified in the 'params_selection.m' file. Two 
classes of algorithms are compared:

   * Centralized algorithms, defined in the 'centralized_algorithms' struct.
   * Distributed algorithms, defined in the 'distributed_algorithms' struct.

The usage of all the other parameters is described in the comments. Additionally,
we provide an almost extensive unitary testing suite which can be executed
with the 'run_test_suite.m' script. Tests are found in the "tests" folder.

A Python porting, built on top of the popular Theano and Lasagne libraries,
is available in the folder 'python'. Differently from the MATLAB version,
this can be run with multiple hidden layers in the network and cross-entropy
losses, while the l1 regularizers are not yet implemented.

Centralized and distributed algorithms are available in two separate modules, while
the script 'run_simulation' can be used to run all the different experiments.
Its configuration is similar to the MATLAB equivalent.

The code is still under active development, so it may change during the next
months. Also, the test suite and the documentation are only partially provided.

The code is distributed under BSD-2 license. Please see the file called LICENSE.

The MATLAB code includes the L1General library by M. Schmidt.
Copyright information is given in the respective folder.

It also uses several utility functions from MATLAB Central. Copyright
information and licenses can be found in the 'functions' folder.

The MATLAB classes for handling network topologies (folder 'classes/NetworkUtilities',
and partitioning of the dataset (folder 'classes/PartitionStrategies'), are
adapted from the Lynx MATLAB toolbox:

[1] P. Di Lorenzo and G. Scutari, "NEXT: In-Network Nonconvex Optimization" 
    (2016). IEEE Transactions on Signal and Information Processing over 
    Networks, 2(2), pp. 120-136.
[2] Facchinei, F., Scutari, G., & Sagratella, S. (2015). "Parallel selective 
    algorithms for nonconvex big data optimization". IEEE Transactions on 
    Signal Processing, 63(7), pp. 1874-1889.
[3] Scardapane S. & Di Lorenzo, P. (2017). "A framework for parallel and 
    distributed training of neural networks". Neural Networks, in press.

   o If you have any request, bug report, or inquiry, you can contact
     the author at simone [dot] scardapane [at] uniroma1 [dot] it.
   o Additional contact information can also be found on the website of
     the author: