Experiments on HDP-HMMs from NIPS 2015 paper
This repository holds Python scripts, plain-text settings files, and dataset binaries for reproducing the experiments in the published research paper:
"Scalable adaptation of state complexity for nonparametric hidden Markov models"
Michael C. Hughes, William Stephenson, and Erik B. Sudderth
Neural Information Processing Systems (NIPS), 2015.
- Repository Overview
- Installation Instructions
- How to run experiments
- How to use Fox et al.'s Matlab code for Gibbs sampler
The required code is organized logically in several directories:
Contains the published paper from NIPS 2015 and the supplementary material.
Contains scripts for launching published experiments on each dataset, as well as text files specifying settings of all model and algorithm hyperparameters for use with the bnpy inference engine. Executing scripts will save inference results to disk on your local machine.
Quick introduction for using these scripts:
Each experiment has a dedicated IPython notebook (and sometimes a helper Python script) for generating the final plots (.eps files).
Example notebooks (using in-browser web viewer):
- Toy dataset experiments (Fig. 3 of NIPS paper)
- Small-dataset Motion Capture experiments (Fig. 6 of NIPS paper)
- Large-dataset Motion Capture experiments (Fig. 7 of NIPS paper)
- Speaker diarization experiments (Fig. 5 of NIPS paper)
Contains MAT file with raw data from all experiments (except those too big to easily share on the web).
Available: toy dataset, motion capture datasets, speaker diarization dataset.
Too big to share easily: whole-genome chromatin dataset (contact first author).
Contains inference code for Fox et al.'s blocked Gibbs sampler.
Ignore this. Here be monsters.
Step 1: Clone this repository
To clone this repo into your current directory, execute this in a terminal:
git clone https://bitbucket.org/michaelchughes/x-hdphmm-nips2015
Step 2: Clone the bnpy git repository
To run experiments, we use the Python module called bnpy developed by our research group. This codebase performs inference for Bayesian nonparametric models.
Project website for bnpy: (https://bitbucket.org/michaelchughes/bnpy-dev).
To clone the bnpy repo into your current directory, execute this in a terminal:
git clone https://bitbucket.org/michaelchughes/bnpy-dev
Todo:make tag "nips2015", to get the exact version used for our experiments and not the latest version (which may behave slightly differently). For now, master branch is ok.
Step 3: Set environment variables
In your terminal:
$ export BNPYROOT=/path/to/bnpy-dev/
This line lets any future program executed in your terminal use the environment variable
$BNPYROOT to find the location of bnpy on your file system. Without this, x-hdphmm-nips2015 scripts wouldn't know where to find the right inference code.
For more about environment variables, you can read Configuring Environment Variables on the bnpy wiki.
How to run experiments
We have pre-packaged executable scripts for each dataset/algorithm combination we published in our paper. For example, to run stochastic variational inference on the toy dataset
DDToyHMM, you would just do:
cd experiments/ python Launch_DDToyHMM_stoch.py
These prepackaged scripts are thin wrappers that call the
LaunchRun.py script. Prepackaged scripts require no additional arguments, and reproduce exactly the experiments we ran.
LaunchRun.py script is more flexible. It performs several key actions:
Determines the dataset to load via
Determines the algorithm to use via
Reads default settings for the algorithm from
Reads default settings for the dataset from
data-settings-<dataName>.txt, overriding algorithm settings as needed.
Reads custom settings (as
--keyword valuepairs) from stdin, and overrides any default settings as needed.
Run_foxHDPHMMsampler.py, providing all custom settings.
Run___.py scripts are just thin wrappers that kick off the inference engines in bnpy and Fox et al.'s MCMC toolbox given specific dataset/algorithm settings.
Running experiments on SUN Grid engine
Many of these experiments would take hours or days if run serially. Instead, we used the SUN grid engine installation within the Brown CS department to parallelize each task.
To use the grid, simply set the environment variable
XHOST to 'grid', like so:
XHOST=grid LaunchRun.py --dataName DDToyHMM --algName memo ...
XHOST stands for experiment host, and defaults to 'local', the local machine.
How to use Emily Fox's Matlab code for Gibbs sampler (included)
To compare against Fox et al.'s blocked Gibbs sampler for the sticky HDP-HMM, we have adapted Fox et al.'s original Matlab code and included that in the code/ directory of this repository.
FYI: You do not need Matlab for our algorithms, only to compare to this sampler.
Compiling MEX files for the sampler
Fox et al.'s code requires Tom Minka's lightspeed toolbox. We've included this toolbox in this repo. You just need to compile it.
$ cd /path/to/x-hdphmm-nips2015/ $ make lightspeed
Major changes to Fox et al.'s code
Hyperparameters are fixed (not resampled) to fairly compare with our variational methods. This can be changed by simply uncommenting the relevant lines of
Snapshots of hamming distance, number of effective states, and a segmentation aligned to ground truth are now saved every few iterations (see store_stats.m).
Hyperparameter settings (and keyword arguments) now exactly match those in bnpy, so calling
Launch_DDToyHMM_sampler.pyor `Launch_DDToyHMM_memo.py' will by default produce inference results using the same prior distributions.