Wiki
bnpy-dev / KeyConcepts / Difference-between-EM-and-VB
What's the difference between EM and VB?
EM and VB are closely related. Both are iterative algorithms that perform coordinate ascent optimization. In fact, within bnpy the high-level learning algorithm code in VBLearnAlg.py
is used unchanged for both EM and VB. The biggest difference is how the algorithm represents global parameters of interest
- EM finds point estimates
- VB finds approximate posterior distributions
In our multivariate Gaussian mixture model, for each cluster mean EM finds a single vector. In contrast, VB finds a Gaussian distribution over the location of each cluster's mean, whose parameters define expected location and variance. You can read more about the conceptual differences here [TODO].
As a practical matter, one big difference is that the observation model (in this case Gaussian) requires a prior distribution for VB, while a prior is optional in EM. Without a prior, EM does maximum likelihood inference, while with a prior we do maximum a posteriori (MAP) inference. The settings of the prior can have a big impact on performance. See [TODO] for an illustration.
Updated