What's the difference between EM and VB?

EM and VB are closely related. Both are iterative algorithms that perform coordinate ascent optimization. In fact, within bnpy the high-level learning algorithm code in VBLearnAlg.py is used unchanged for both EM and VB. The biggest difference is how the algorithm represents global parameters of interest

EM finds point estimates
VB finds approximate posterior distributions

In our multivariate Gaussian mixture model, for each cluster mean EM finds a single vector. In contrast, VB finds a Gaussian distribution over the location of each cluster's mean, whose parameters define expected location and variance. You can read more about the conceptual differences here [TODO].

As a practical matter, one big difference is that the observation model (in this case Gaussian) requires a prior distribution for VB, while a prior is optional in EM. Without a prior, EM does maximum likelihood inference, while with a prior we do maximum a posteriori (MAP) inference. The settings of the prior can have a big impact on performance. See [TODO] for an illustration.

Wiki

bnpy-dev / KeyConcepts / Difference-between-EM-and-VB

What's the difference between EM and VB?