TASK: Experiments on nips dataset

Issue #25 closed
Mike Hughes repo owner created an issue

You should have a pending invite to my topic-model experiments repository: https://bitbucket.org/michaelchughes/x-topics/

I keep this separate from bnpy, because (1) it has lots of private things for a paper I'm writing, and (2) there's lots of third-party code from other groups that we cannot distribute with bnpy.

For now, you should only need to checkout the datasets/ folder. In it, there are several options, including

  • nips : 1k articles from NIPS conference
  • science : 13k articles from journal science
  • wiki : 7k Wikipedia articles

Set up bnpy to work with this external data by running

export BNPYDATADIR=/path/to/x-topics/datasets/nips/

For now, I'd suggest using MixModel (or DPMixFull) as the AllocModel, and then Mult as the ObsModel. Basically, this does document clustering. Do very similar experiments as before, basically look at sensitivity to # of clusters K and to the initializations.

Num of Clusters/Topics K

I'd look at maybe K=10 as a "low" value and K=100 as a "high" (for now) value.

Initializations

You can try

  • --initname randexamples
  • --initname kmeansplusplus
  • --initname randomlikewang

Making plots

Play around with bnpy.viz.PrintTopics to get print-outs of the topic-word parameters.

Comments (2)

  1. Log in to comment