Add AllocModel extending HDPHMM to incorporate multimodal emissions

Issue #40 new
Jake Soloff created an issue

We want to extend the allocation model to contain more local parameters, specifically a fixed finite number of mixture components per state. First, I will submit a toy dataset for which this model may be useful: a nice choice could be a diagonally dominant toy HMM where each cluster has more interesting shape than a Gaussian blob. Then, I will need to add a new class extending HDPHMM.py in several ways. The new attributes will be an int C corresponding to the number of mixture components, a vector that parametrizes the Dirichlet posterior for the mixture weights, and local parameters for the joint probability of a mixture component and state. Methods within the class would need to be expanded to handle these variables. We could also use the existing HDPHMM file, with a default setting C=1, reducing to the original case.

Comments (2)

  1. Mike Hughes repo owner

    Thanks Jake!

    1) Yes, toy dataset is top priority. Looks like you've made good progress already.

    When you're ready, please make a pull request for a completed toy dataset that has

    • code for generating data, with optional arguments to set the number of sequences (--nDocTotal) and the length of each sequence (--T)
    • plots that show the "true" segmentation, as well as true cluster shape parameters.

    2) You do not need to change anything in HDPHMM.py or create any allocation model at all. Instead, you will create a new observation model, named MixGaussObsModel.py. You should be able to use the existing HDPHMM.py unchanged.

    This will extend AbstractObsModel, will the following key attributes:

    Attributes

    • an int named "C", indicating the number of substates.
    • a ParamBag named "Post", which holds the substate global parameters (frequency parameters and emission parameters)

    Methods

    calc_local_params (should really be called calc_conditional_loglikelihood or similar)

    This function will compute, as other models do, E[ \log p( x_n | \phi_k) ] This should involve summing over all possible substates of superstate k This is the first step of your local inference procedure.

    calc_substate_local_params

    This function will take as input the local parameters dictionary LP computed by the HDPHMM.py allocation model. This dict will have a 'resp' field that holds the marginal assignment probabilities computed by the forward-backward algorithm.

    Given this 'resp', the code here will compute the substate marginal probabilities at each timstep t. You will put the computed values into the dict LP as a field named 'substate_resp'.

    calc_suff_stats

    This function will take the LP dict with 'substate_resp', and compute the relevant sufficient statistics for the emission parameters and substate frequencies.

    update_global_params

    This function will take the SuffStatBag computed by calc_suff_stats, and update the global parameters as needed.

  2. Mike Hughes repo owner

    Note for Jake: To make the substate_resp calculation happen, add the following to HModel's calc_local_params:

    if hasattr(hmodel.obsModel, 'calc_substate_params'):
        LP = hmodel.obsModel.calc_substate_params(Data, LP)
    
  3. Log in to comment