ENH-birth-selection

View source
ENH-birth-selection
  • Contributors
    1. Loading...
Author Commit Message Date Builds
44 commits behind master.
Mike Hughes
FIX Run.py is safer about setting up files specific to BrownCS grid
Mike Hughes
FIX Removed errant embed statements
Mike Hughes
FIX MultObsModel has less strict check for values near zero in min Div calculation
Mike Hughes
ENH Fixed test for learning rho/theta from fixed doctopiccounts, so it works in many cases now. Some util methods appropriately integrated into bnpy.util
Mike Hughes
FIX HDPTopicModel global update now allows sortorder=[1,2,0,4] kwarg
Mike Hughes
ENH Big improvements after experiments on sorting order of HDPtopic models. forceRhoInBounds avoids weird artifacts happening in beta space. Optimizer will always verify that final output improves on initialization. tests/allocmodel/topics/ contains several scripts for diagnosing issues in learning rho and theta given fixed DocTopicCounts.
Mike Hughes
ENH small readability fix for OptimRhoOmegaBetter
Mike Hughes
FIX OptimizerRhoOmega now uses None values as defults for sumLogPi stats, not zeros.
Mike Hughes
ENH LearnAlg logic for callbacks improved.
Mike Hughes
ADD callbacks/ module. This is where we put code that we call via --customFuncHook <filename.py>. Should be useful.
Mike Hughes
ENH MultObsModel bregman divergence calculation is now way faster than before. Speedup comes from computing DivDataVec once and reusing it during plusplus initialization, and then in refinement iterations using --includeOnlyFastTerms, which avoids the data term DivDataVec.
Mike Hughes
MAINT Improved docs for NumericUtil inplaceLog
Mike Hughes
FIX profiler users /path/to/profiler/, not /profile/, which makes css work correctly. Also fixed some issues so we can specify nLap=0 to just do initialization.
Mike Hughes
FIX wrong option specified in moves.conf. Fixed.
Mike Hughes
ENH MemoVBMovesAlg edited so shuffle uses sumLogPi vector as cue for ordering, *not* the counts directly. We've found some examples where shuffling by counts alone yields noticeably lower objective values.
Mike Hughes
FIX OptimizerRhoOmegaBetter had stupid mistake in computing f_safe (final objective value using safe rho/omega, guaranteed in bounds). Used omega where should have used rho. Corrected. Hopefully no more overflow errors.
Mike Hughes
ENH LocalStepManyDocs now avoids a copy operation to make the resp field for whole datasets.
Mike Hughes
FIX ioutil loadModelForLap fixed so even if only one Lap___.mat file exists on disk, saveLaps will be loaded as a 1-d array, not 0-dim.
Mike Hughes
ADD SLogger to track log for shuffle moves. Each run with shuffles enabled now prints logged messages to shuffle-transcript-verbose.txt
Mike Hughes
FIX edge case where bregman init encounters data with duplicate rows. Use only one cluster for each set of duplicates, shrinking from specified number of clusters if needed.
Mike Hughes
INPROGRESS Memo alg uses newer restricted local step
Mike Hughes
ENH Improved birthmoves. TryBirth now *correctly* reads float params from args-memoVB.txt, and tracks how long the local coordascent iters take.
Mike Hughes
ENH calc_local_params takes relevant args so localstep logging happens.
Mike Hughes
ENH added kwarg option --b_method_initCoordAscent that enables restarting each restricted local step from previous output, if set to 'fromprevious'
Mike Hughes
FIX wayward embed statements from obsmodels
Mike Hughes
ENH TryBirth now pretty functional, tested on nips/bars/asterisk. Also, added debugging step that computes ELBO for the initial hard assignments. Need to investigate what is going on here more.
Mike Hughes
FIX Removed wayward embed statement in Memo alg
Mike Hughes
FIX GaussViz now doesn't require a valid allocModel defined along with the obsModel, but will use if it can.
Mike Hughes
ENH Added TryBirth script that will perform birth move (including all HTML output and all logging, if desired) on demand for specific model or saved run.
Mike Hughes
ENH Added TryBirth.py, a script for interactively trying the birth move and diagnosing issues.
Mike Hughes
ENH Added DataReader.py, which allows an easy function to load the training data specifically used for a saved task.
Mike Hughes
ENH Added special file for HDPTopicRestrictedLocalStep. Improving BCreate by removing junk code.
Mike Hughes
ENH Added new and improved OptimizerRhoOmegaBetter. Now, we always fix the omega vector and just optimize the rho vector. Should be more stable and equally high quality. Tests look good so far...
Mike Hughes
ENH Updated memoVB alg to better track pieces of birth objective
Mike Hughes
INPROGRESS Updating rhoomega updating
Mike Hughes
ENH Added better logging for delete move Lterms. Also planted early stage code for using reconfigure-word type moves in the delete.
Mike Hughes
ENH BRefine (restricted step) can now take optional absorbingIDs, which bias the prior towards the document-topic distribution used by non-target atoms in the document.
Mike Hughes
ENH Improved deletemove logs so more easily searched for targetUID (no whitespace to deal with)
Mike Hughes
FIX HDP Topics sign error in calcHrespForMergePairs, where we didn't multiply by neg. one when should have.
Mike Hughes
FIX delete moves now use guaranteed never-before-used UIDs, instead of hack olduid+1000
Mike Hughes
FIX should not retain birth proposal at last batch of a lap.
Mike Hughes
FIX BRefine.py now properly handles case where no new comp has significant mass, by assigning all mass to the first comp.
Mike Hughes
FIX Delete move once again conforms to desired single-line-message summary per lap.
Mike Hughes
FIX BLogger once again creates a one-line summary entry after every lap.
Mike Hughes
FIX Two important fixes, which lead to correct recovery in BinaryBarsK20 dataset. First, we only penalize shortlisted UIDs as failures if they are ineligible at all batches. They may just not be chosen due to budget constraints which is fine. Second, we fix the way that failed local step proposals are tracked, so it is consistent even across batches.
Mike Hughes
ENH Improved birth planner to avoid uids actively used by merge or delete.
Mike Hughes
FIX Entropy for HDP with Gaussians is now properly accounted for.
Mike Hughes
FIX HDP restricted local step improved, by handling atoms only with significant mass (>0.01) explicitly, and leaving the rest as a lump placed with largest topic in that doc.
Mike Hughes
INPROGRESS Improving weird bug in refinement iterations
Mike Hughes
FIX Small issues that prevented births with Gaussian likelihoods
Mike Hughes
ENH Fixed bregman computations for Bern and Mult, added functionality for DiagGauss.
Mike Hughes
ENH RunBregmanKMeans.py tested and works with simple toy datasets
Mike Hughes
ENH Gauss bregman kmeans finally passes tests.
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
ENH Improved tests for bregman kmeans for FixedVarGauss and Gauss
Mike Hughes
INPROGRESS Gauss kmeans with bregman
Mike Hughes
INPROGRESS testing gaussian computations.
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
ENH Updated sanity checks for bregman
Mike Hughes
INPROGRESS Gaussian bregman. Objective func still needs help.
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
ENH Brgman kmeans now works for ZeroMeanGauss. Added plotting scripts for visualizing divergences too.
Mike Hughes
FIX Init from K=1 now works with brgkmns
Mike Hughes
FIX MemoVBMovesAlg had error when run without births. Fixed it.
Mike Hughes
ENH Births now chosen within expansion step, immediately before execution. This lets us use the number of items in the current batch to make just-in-time decisions.
Mike Hughes
FIX Score calculation for birth now actually uses a vector for each cluster.
Mike Hughes
FIX Birth moves dont remove UIDs from list while iterating over it. Apparently causes problems.
Mike Hughes
FIX improved handling of edge case where we grow to fill budget of Kmax states. More informative log messages, better behavior.
Mike Hughes
FIX SuffStatBag's insertComps method now properly adjusts K field of _MergeTerms when new comps are inserted (and the new comps dont have addition merge terms)
Mike Hughes
ENH Improved logging and debug-to-html features of birth proposal, esp. local expansion step.
Mike Hughes
ENH Changed defaults in moves.conf to lower budget limits, so comps with one cluster can be easily created.
Mike Hughes
ENH Improved display of Bars topics in HTML debugging. Multiline xlabel indicates both atom count and doc count for HDP models.
Mike Hughes
FIX DPMixtureModel elbo term now called Lalloc instead of Lglobal. More specific and more standard with other models.
Mike Hughes
REFACTOR switch to simpler name plotCompsFromSS. Adjusted all dependencies accordingly. Can optionally save .png to disk if provide an output path.
Mike Hughes
ENH Add new log files 'log-elapsedtime-___.txt' that help track how long we've spent at each step of the algorithm (local/global/birthmove/etc)
Mike Hughes
ENH Support for initLPFromResp now added to bregmankmeans init.
Mike Hughes
ENH Confirmed --initname bregmankmeans works for Mult and Bern.
Mike Hughes
ENH Added initname bregmankmeans. Seems to work for Mult obsmodel.
Mike Hughes
ENH Standardize logging for delete move. delete-transcript-summary.txt now has one message per lap. delete-transcript-verbose.txt collects all the gory details.
Mike Hughes
ENH better error message if BNPYOUTDIR not found, and will now attempt to create dir if does not exist
Mike Hughes
FIX Mult obs model could have prior hypers rewritten by edge case of bregman kmeans. Now fixed. Some improved logging messages too.
Mike Hughes
FIX bug in call to OptimizerRhoOmega for HDPHMM
Mike Hughes
ENH Improved birth logging readability, and adjusted cleanup phase to only accept one merge per iteration (--b_cleanupMaxNumAcceptPerIter), which avoids some local optima caused by merging too greedily.
Mike Hughes
ENH Improvements to birth move logging.
Mike Hughes
ENH Polished logging for merge move. Should be easy to read the short log and verbose log to reconstruct what happened.
Mike Hughes
ENH Improved birth logging, including separate log file for each uid. Standardized breg divergence init into new file FromScratchBreg.py
Mike Hughes
ENH Improve and standardize BregmanKMeans functions for Bern and Mult obsmodels.\n Now both work with weight vector W set to None or set to positive values.
Mike Hughes
ENH Support breg div clustering for Bernoulli likelihood. Add smoothFracInit option for runBregKMeans function, so that we smooth for initialization (so divergences equal zero when mu equals smoothed x), but dont smooth after that (so we have guaranteed objective func improvements).
Mike Hughes
ENH Updates to improved bregdiv init technique.
Mike Hughes
ENH Updated init module so bregman kmeans is available
Mike Hughes
FIX test script for calculating bregman divergence for zero-mean gaussian now always converts to posterior parameter \mu (adding in the prior)
Mike Hughes
ENH renamed DPlanner.py, improved logging for births and deletes.
Mike Hughes
INPROGRESS added breg-div calc for zmg
Mike Hughes
ENH HDP and DP now use the same createSuffStats function. easier and simpler that way
Mike Hughes
INPROGRESS improved logging of birth plans
Mike Hughes
REFACTOR cleaned up birth/merge/delete folders and moved deprecated content to zzzdeprecated/ folders. Some progress on births for Mult and DP likelihoods
Mike Hughes
INPROGRESS DPmixtures tracking merges with O(M) space instead of O(K*K).
Mike Hughes
FIX DelPlanner works without birth moves enabled
Mike Hughes
INPROGRESS birth/merge/delete seem to be cooperating
Mike Hughes
INPROGRESS delete as a combo of birth/merge
Mike Hughes
ENH switched HDPTopicModel over to tracking only O(M) merge terms, rather than O(K^2). Should be some huge savings in storage, etc.
Mike Hughes
INPROGRESS adding simple pruning to shuffle, and prepping for larger delete moves.
Mike Hughes
INPROGRESS now track proposals across batches, seems to work well. Also much better selection of which items to try next.
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
ENH small scripts to debug breg div calculations
Mike Hughes
FIX MemoVBMovesAlg now has better convergence tracking, by aggregating over all batches and not just the most recent one.
Mike Hughes
FIX mergeComps method of suff stats now respects sumLogPiRemVec as a special field, and handles differently (otherwise naively merge of first and last comp would bring truncation down to K=1, which is silly). Shuffle, birth, and merge seem to work OK together.
Mike Hughes
INPROGRESS birth and merge revamped... working on shuffle
Mike Hughes
ENH added initHardCluster option for birth moves, and updated new learn alg that is simpler to extend
Mike Hughes
ENH big improvements to visualization pipeline for birht proposals
Mike Hughes
ADD new visualization routines for looking at proposals
Mike Hughes
ENH added RunAllBirthsFromFixedDatasetAndModel, as a way to benchmark birth success via HTML page for each comp of a given model
Mike Hughes
ENH WordsData always has a vocabList option (which is None when not provided), and HModel's calc_evidence method now does smart Ltotal calculation when given todict kwarg
Mike Hughes
ENH added new spt prop using breg div
Mike Hughes
ENH WordsData getDocTokenType matrix methods now take a weights kwarg
Mike Hughes
ADD new test file for trying multiple attempts on diff datasets
Mike Hughes
ADD some preliminary explorations of new init scheme, based on some good exponential family properties
Mike Hughes
ENH added functionality to make diagnostic plots of accept/reject status and various elbo terms as we see more data
Mike Hughes
ENH WordsData create toy data from LDA model now re-seeds the PRNG at each doc, so that we can create comparable datasets with different values of N_d (words per doc)
Mike Hughes
ENH default colormap for BarsViz is now white for zero, and darker shades of black for non-zero
Mike Hughes
ADD script to do big experiments for birth tests
Mike Hughes
INPROGRESS test script now takes command line args, and saves plots to file to help overall diagnostics
Mike Hughes
INPROGRESS big improvements to visualizations that illustrate how elbo changes as proposal evolves
Mike Hughes
INPROGRESS HDPTopicModel single pass now works for both truelabels and kmeans creation proposals. Added test for rho/omega optimizer that confirms its using the same objective function we use to calculate the ELBO
Mike Hughes
INPROGRESS added verification by calculating propSS for HDPTopic in two ways, using transferMass operation on xSS, and directly
Mike Hughes
INPROGRESS changes to HDPTopicModel and SuffStatBag to allow expansions
Mike Hughes
INPROGRESS testing out sptmv for DPMixtures
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
ENH added notebooks to demo split move details
Mike Hughes
INPROGRESS rcfgwrd proposal has both a makePlan() method to identify promising sets-of-words to target, and a makeProposal/evaluateProposal method pair that can create (from scratch) good proposals. This is topic-model specific.
Mike Hughes
INPROGRESS working on rcfgwrd move
Mike Hughes
ENH improved side-by-side viz of before/after configs, which can now show a picture at every step
Mike Hughes
INPROGRESS BViz tool works for DPMixtureModel and HDPTopicModel now. Very exciting.
Mike Hughes
INPROGRESSS trueparams birth now drafted
Mike Hughes
INPROGRESS births for mixture models have infrastructure in place... just need more sophisticated proposal mechanisms
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
INPROGRESS added BMain for primary runBirthMove function, BRefinery to hold refine step logic, BPlanner to hold selection logic, and BProposals to hold proposal logic
Mike Hughes
ENH GaussViz Data plots are now slightly transparent, to improve visibility
Mike Hughes
ENH added returnVec flag to all obsmodels, so we can get Ldata for each comp. Also added TestBSelector scripts: Usage: python TestBSelector_Bern --K 5 --Nk 25
Mike Hughes
ENH added preliminary testbed for checkout out bound-based comp selection