Wiki

Clone wiki

bnpy-dev / HeldoutLikelihoodComputation

Heldout Likelihood Computation

0) End-to-end demo

For folks at Brown, here's an end-to-end demo on a subset of the Yelp data:

$ cd /data/liv/textdatasets/yelp/bnpy_words_data/YelpD1000/
$ python RunTopicModelOnYelpWithHeldoutMetrics.py
$ python -m bnpy.viz.PlotELBO YelpD1000 testheldout # Plot training objective
$ python -m bnpy.viz.PlotHeldoutLik YelpD1000 testheldout # Plot heldout prediction quality
The script basically trains both a mixture and a topic model on a small subset of Yelp data.

The final two lines make plots of training quality and test-set quality.

1) Get the code

Note: Currently, this functionality only lives on my feature branch (ENH-better-moves-with-relational). So you'll need to check that out. I hope to integrate to master soon.

$ cd /path/to/bnpy-dev/
$ git fetch # Update your local repo with pointers to latest on bitbucket
$ git checkout ENH-better-moves-with-relational
$ make all # Rebuild C++ routines for fast inference. Requires EIGENPATH set appropriately.

2) Define your train/test datasets

I'll assume you are using a Python script to load your data. To refresh your memory on dataset scripts, see this wiki page: https://bitbucket.org/michaelchughes/bnpy-dev/wiki/Code/Data/DataFormat.md

Basically, you need to have two methods in your script: get_data(), and get_test_data(), where the latter obviously loads the heldout set, instead of the training set.

Be sure your script lives on a path accessible to Python. BNPYDATADIR is the environment variable to use if you want your script in a custom location.

3) Run inference!

The big idea is that you train your model by calling bnpy.Run as normal, but with an extra kwarg that enables heldout likelihood computation (using bnpy's new shiny callback interface). With this enabled, at every checkpoint (when we save a snapshot of global parameters), we will also compute snapshots of heldout likelihood.

Within python, you run

>>> bnpy.run('MyDataset', 'HDPTopicModel', 'Mult', 'VB',
    customFuncPath='CBCalcHeldoutMetricsTopicModel',
    ... other bnpy keyword options ...
    )

You'll now see some cool additional output for heldout performance

    0.000/50 heldout metrics   | K   50 | avgLik -9.0205 avgAUC 0.7631 avgRPrec 0.0505
        1/50 after      8 sec. | K   50 | ev -8.779020188e+00 |  
    1.000/50 heldout metrics   | K   50 | avgLik -9.1561 avgAUC 0.8261 avgRPrec 0.0553
        2/50 after     34 sec. | K   50 | ev -8.581941245e+00 | Ndiff14619.818 
    2.000/50 heldout metrics   | K   50 | avgLik -9.3468 avgAUC 0.8369 avgRPrec 0.0548

Here, avgLik is the average predictive likelihood (aka perplexity). You can read more about it in our AISTATS '15 paper.

Addtionally, for topic models on words data we monitor another prediction task: given a partial document where we withhold a subset of the vocabulary, how well can the algorithm do at ranking which of the heldout vocabulary words are present in the document. This is basically a ranked retrieval task, so we can use area-under-the-curve (AUC) and R-precision to measure performance. For both metrics, performance near 1.0 is perfect, and near 0.0 is awful (actually, near 0.5 for AUC is awful).

4) Viewing the raw results

As usual, the heldout likelihood experiment results are dropped in the same folder as the inference results.

After every run, we produce the folllowing files

  • predlik-lapTrain.txt

List of which training laps were used as checkpoints.
At each of these laps, we performed heldout metrics to assess the current model.

  • predlik-K.txt

List of what the number of represented topics K was in the model at each checkpoint.

  • predlik-avgLikScore.txt

List of log likelihood scores, averaged over all documents in heldout set.

You can also find per-document results in various .mat files

5) Making pretty plots

The bnpy.viz package has a PlotHeldoutLik script, which functions much like other visualization tools in bnpy.

You can use it from command line, or from python.

Example command line usage

To plot heldout likelihood vs. training laps

python -m bnpy.viz.PlotHeldoutLik <DatasetName> <JobPattern> 

To plot the AUC scores vs. training laps

python -m bnpy.viz.PlotHeldoutLik <DatasetName> <JobPattern> --yvar AUC

Updated