Wiki

Clone wiki

bnpy-dev / QuickStart / GMMCompareLearnAlgs

l# Goal

bnpy makes it easy to compare learning algorithms for fitting the same model to the same data. This demo shows how to fit a Gaussian mixture model to toy data using several online and offline versions of variational bayesian inference.

Learning Algorithms

Throughout bnpy, we use these abbreviations for learning algorithms

  • VB : variational bayesian inference
  • soVB : stochastic online variational bayesian inference
  • moVB : memoized online variational bayesian inference

VB is an offline algorithm that processes the entire dataset in between parameter updates.

soVB is an online algorithm that processes data in batches and performs parameter updates by stochastic gradient descent.

moVB is an online algorithm that processes data in batches, but achieves noise-free parameter updates through memoization (smart caching of previously computed results).

For thorough conceptual background, see TODO.

How to compare algorithms fairly

To have a fair comparison, we'll need to make sure we fit the same model to the same data from the same initialization. bnpy was designed to make this very easy.

Same data

Throughout this demo, we'll use the Asterisk toy data. bnpy is careful here to use the same exact data for both offline and online algorithms, even though this toy data is generated randomly on-demand when Run.py is executed. For online algorithms, the data is divided into the same distinct batches and traversed in the same order.

Same model

The appropriate model for the Asterisk data is a Gaussian observation model. All the VB learning algorithms require specifying the hyperparameters of this model's Gaussian-Wishart prior. This is done through command-line arguments passed to bnpy (if unspecified, default values are reasonable). So long as the same specification is done for all learning algorithms, the comparison is fair.

For this demo, we'll place a vague prior whose expected covariance matrix is a scale multiple of the identity matrix ("eye"). This is specified by the following keyword options to Run.py.

--ECovMat eye --sF 1e-2

Here, --ECovMat specifies the name of the routine used to construct the expected covariance matrix, while --sF is a scalar multiplier applied to the resulting matrix to make it larger or smaller. Setting --sF very small (close to zero) makes the prior very weak and approaches maximum-likelihood estimation.

See TODO for more background on setting prior parameters for GMMs.

Same initialization

bnpy initialization procedures are briefly covered in the GMM-EM compare initializations demo, with more thorough documentation in TODO.

For this demonstration, what you need to know is that the random seed that controls the initialization procedure is entirely determined by the jobname and the taskid of the particular run. The jobname is a "human-readable" name of the experiment you're running. The taskid is usually automatically determined by bnpy: it starts at 1 and counts up for the number of runs you wish to try.

So, if we're comparing two or more learning algorithms, so long as we give them the same jobname, each distinct run (aka task) will be initialized to exactly the same parameters.

Run the Experiment

We can compare all three algorithms just be changing the algName positional argument to Run.py.

Run VB: offline variational bayes

python -m bnpy.Run AsteriskK8 MixModel Gauss VB --nLap 50 --K 8 --ECovMat eye --sF 1e-2 --jobname algComparison --nTask 3

Run soVB: stochastic online VB

python -m bnpy.Run AsteriskK8 MixModel Gauss soVB --nLap 50 --K 8 --ECovMat eye --sF 1e-2 --jobname algComparison --nTask 3

Note: this uses the default settings for learning rate schedule. See soVB guide for how to set these parameters.

Run moVB: memoized online VB

python -m bnpy.Run AsteriskK8 MixModel Gauss moVB --nLap 50 --K 8 --ECovMat eye --sF 1e-2 --jobname algComparison --nTask 3

The online algorithms above use the default settings for the number of batches (nBatch=10). To use a custom setting, use the argument --nBatch.

ELBO comparison (side-by-side)

We can compare the ELBO traces of all three learning algorithms side-by-side. This is a sanity check that all three are working properly: we fit the same model to the same data from the same initializations, so we should have similar ELBOs regardless of the learning algorithm used.

python -m bnpy.viz.PlotELBO AsteriskK8 MixModel Gauss VB,soVB,moVB --jobnames algComparison --legendnames VB,soVB,moVB

It looks like for this simple dataset, runs using the same initializations converge to very similar solutions regardless of the algorithm used. This is a good sanity check that our intended fair comparison works out in practice. As we might expect, the online algorithms appear to reach their plateaus slightly faster than full-dataset VB.

Parameter visualization

Again, we can plot the cluster parameters learned by each algorithm using PlotComps.py.

python -m bnpy.viz.PlotComps AsteriskK8 MixModel Gauss <algName> --taskids 1-3 --doPlotData --jobnames algComparison
where <algName> is one of VB, soVB, moVB.

As we saw above in the ELBO trace plot, each run from the same initialization converges to similar solution quality. So it's not surprising that the resulting learned parameters look extremely similar here. For this simple dataset, given the same initialization VB, soVB, and moVB have highly similar behavior, which isn't too surprising. However, we'll expect to see performance differ on more complicated problems.

VB learned parameters (3 runs)


soVB learned parameters (3 runs)


moVB learned parameters (3 runs)


Updated