Wiki

Clone wiki

bnpy-dev / KeyConcepts / Difference-between-EM-and-VB

What's the difference between EM and VB?

EM and VB are closely related. Both are iterative algorithms that perform coordinate ascent optimization. In fact, within bnpy the high-level learning algorithm code in VBLearnAlg.py is used unchanged for both EM and VB. The biggest difference is how the algorithm represents global parameters of interest

  • EM finds point estimates
  • VB finds approximate posterior distributions

In our multivariate Gaussian mixture model, for each cluster mean EM finds a single vector. In contrast, VB finds a Gaussian distribution over the location of each cluster's mean, whose parameters define expected location and variance. You can read more about the conceptual differences here [TODO].

As a practical matter, one big difference is that the observation model (in this case Gaussian) requires a prior distribution for VB, while a prior is optional in EM. Without a prior, EM does maximum likelihood inference, while with a prior we do maximum a posteriori (MAP) inference. The settings of the prior can have a big impact on performance. See [TODO] for an illustration.

Updated