- changed status to resolved
TASK: Experiments on nips dataset
You should have a pending invite to my topic-model experiments repository: https://bitbucket.org/michaelchughes/x-topics/
I keep this separate from bnpy, because (1) it has lots of private things for a paper I'm writing, and (2) there's lots of third-party code from other groups that we cannot distribute with bnpy.
For now, you should only need to checkout the datasets/ folder. In it, there are several options, including
- nips : 1k articles from NIPS conference
- science : 13k articles from journal science
- wiki : 7k Wikipedia articles
Set up bnpy to work with this external data by running
export BNPYDATADIR=/path/to/x-topics/datasets/nips/
For now, I'd suggest using MixModel (or DPMixFull) as the AllocModel, and then Mult as the ObsModel. Basically, this does document clustering. Do very similar experiments as before, basically look at sensitivity to # of clusters K and to the initializations.
Num of Clusters/Topics K
I'd look at maybe K=10 as a "low" value and K=100 as a "high" (for now) value.
Initializations
You can try
--initname randexamples
--initname kmeansplusplus
--initname randomlikewang
Making plots
Play around with bnpy.viz.PrintTopics
to get print-outs of the topic-word parameters.
Comments (2)
-
reporter -
reporter - changed status to closed
Experiments finished.
- Log in to comment