TASK: Experiments on nips dataset

You should have a pending invite to my topic-model experiments repository: https://bitbucket.org/michaelchughes/x-topics/

I keep this separate from bnpy, because (1) it has lots of private things for a paper I'm writing, and (2) there's lots of third-party code from other groups that we cannot distribute with bnpy.

For now, you should only need to checkout the datasets/ folder. In it, there are several options, including

nips : 1k articles from NIPS conference
science : 13k articles from journal science
wiki : 7k Wikipedia articles

Set up bnpy to work with this external data by running

export BNPYDATADIR=/path/to/x-topics/datasets/nips/

For now, I'd suggest using MixModel (or DPMixFull) as the AllocModel, and then Mult as the ObsModel. Basically, this does document clustering. Do very similar experiments as before, basically look at sensitivity to # of clusters K and to the initializations.

Num of Clusters/Topics K

I'd look at maybe K=10 as a "low" value and K=100 as a "high" (for now) value.

Initializations

You can try

--initname randexamples
--initname kmeansplusplus
--initname randomlikewang

Making plots

Play around with bnpy.viz.PrintTopics to get print-outs of the topic-word parameters.

Num of Clusters/Topics K

Initializations

Making plots

Comments (2)