1. Mike Hughes
  2. x-hdphmm-nips2015

Overview

HTTPS SSH

Experiments on HDP-HMMs from NIPS 2015 paper

This repository holds Python scripts, plain-text settings files, and dataset binaries for reproducing the experiments in the published research paper:

"Scalable adaptation of state complexity for nonparametric hidden Markov models"

Michael C. Hughes, William Stephenson, and Erik B. Sudderth

Neural Information Processing Systems (NIPS), 2015.

Jump to:

Repository overview

The required code is organized logically in several directories:

Root directory

Contains the published paper from NIPS 2015 and the supplementary material.

experiments/

Contains scripts for launching published experiments on each dataset, as well as text files specifying settings of all model and algorithm hyperparameters for use with the bnpy inference engine. Executing scripts will save inference results to disk on your local machine.

Quick introduction for using these scripts: How-To-Run-Experiments.

notebooks/

Each experiment has a dedicated IPython notebook (and sometimes a helper Python script) for generating the final plots (.eps files).

Example notebooks (using in-browser web viewer):

datasets/

Contains MAT file with raw data from all experiments (except those too big to easily share on the web).

Available: toy dataset, motion capture datasets, speaker diarization dataset.

Too big to share easily: whole-genome chromatin dataset (contact first author).

code/

Contains inference code for Fox et al.'s blocked Gibbs sampler.

zzzdeprecated/

Ignore this. Here be monsters.

Installation Instructions

Step 1: Clone this repository

To clone this repo into your current directory, execute this in a terminal:

git clone https://bitbucket.org/michaelchughes/x-hdphmm-nips2015

Step 2: Clone the bnpy git repository

To run experiments, we use the Python module called bnpy developed by our research group. This codebase performs inference for Bayesian nonparametric models.

Project website for bnpy: (https://bitbucket.org/michaelchughes/bnpy-dev).

To clone the bnpy repo into your current directory, execute this in a terminal:

git clone https://bitbucket.org/michaelchughes/bnpy-dev

You'll need to follow the Installation and Configuration instructions from the bnpy wiki. These instructions will make sure that you have all the required dependencies (numpy, scipy, IPython, etc.).

Todo:make tag "nips2015", to get the exact version used for our experiments and not the latest version (which may behave slightly differently). For now, master branch is ok.

Step 3: Set environment variables

In your terminal:

$ export BNPYROOT=/path/to/bnpy-dev/

This line lets any future program executed in your terminal use the environment variable $BNPYROOT to find the location of bnpy on your file system. Without this, x-hdphmm-nips2015 scripts wouldn't know where to find the right inference code.

For more about environment variables, you can read Configuring Environment Variables on the bnpy wiki.

How to run experiments

We have pre-packaged executable scripts for each dataset/algorithm combination we published in our paper. For example, to run stochastic variational inference on the toy dataset DDToyHMM, you would just do:

cd experiments/
python Launch_DDToyHMM_stoch.py

These prepackaged scripts are thin wrappers that call the LaunchRun.py script. Prepackaged scripts require no additional arguments, and reproduce exactly the experiments we ran.

The underlying LaunchRun.py script is more flexible. It performs several key actions:

  • Determines the dataset to load via --dataName keyword argument.

  • Determines the algorithm to use via --algName keyword argument.

  • Reads default settings for the algorithm from settings-<algName>.txt.

  • Reads default settings for the dataset from data-settings-<dataName>.txt, overriding algorithm settings as needed.

  • Reads custom settings (as --keyword value pairs) from stdin, and overrides any default settings as needed.

  • Executes either Run_bnpy.py or Run_foxHDPHMMsampler.py, providing all custom settings.

The final Run___.py scripts are just thin wrappers that kick off the inference engines in bnpy and Fox et al.'s MCMC toolbox given specific dataset/algorithm settings.

Running experiments on SUN Grid engine

Many of these experiments would take hours or days if run serially. Instead, we used the SUN grid engine installation within the Brown CS department to parallelize each task.

To use the grid, simply set the environment variable XHOST to 'grid', like so:

XHOST=grid LaunchRun.py --dataName DDToyHMM --algName memo ...

Note that XHOST stands for experiment host, and defaults to 'local', the local machine.

How to use Emily Fox's Matlab code for Gibbs sampler (included)

To compare against Fox et al.'s blocked Gibbs sampler for the sticky HDP-HMM, we have adapted Fox et al.'s original Matlab code and included that in the code/ directory of this repository.

FYI: You do not need Matlab for our algorithms, only to compare to this sampler.

Compiling MEX files for the sampler

Fox et al.'s code requires Tom Minka's lightspeed toolbox. We've included this toolbox in this repo. You just need to compile it.

$ cd /path/to/x-hdphmm-nips2015/
$ make lightspeed

Major changes to Fox et al.'s code

  • Hyperparameters are fixed (not resampled) to fairly compare with our variational methods. This can be changed by simply uncommenting the relevant lines of RunSampler.m.

  • Snapshots of hamming distance, number of effective states, and a segmentation aligned to ground truth are now saved every few iterations (see store_stats.m).

  • Hyperparameter settings (and keyword arguments) now exactly match those in bnpy, so calling Launch_DDToyHMM_sampler.py or `Launch_DDToyHMM_memo.py' will by default produce inference results using the same prior distributions.