View source
bnpy-dev-slda
  • Contributors
    1. Loading...
Author Commit Message Date Builds
44 commits behind master.
leahdw
update
leahdw
sHDP udpates
leahdw
sHDP updates
leahdw
updates
leahdw
most recent
leahdw
update
leahdw
updates
leahdw
updated slda with priors
leahdw
updates to vb on reg weights
leahdw
add test function
leahdw
Prior on regression slda orig
leahdw
updates
leahdw
updates to vb on Y
leahdw
put prior on regression weights and variance
leahdw
updates
leahdw
SupervisedLocalStepManyDocs2
leahdw
merge
leahdw
new init for theta orig slda
Mike Hughes
FIX Updated SupervisedRegressTopicDistributionLocalStepSingleDoc to do smarter init for theta_d_K, do more iterations, and have finer control over grad descent options
Mike Hughes
Merge branch 'bnpy-dev-slda' of https://bitbucket.org/michaelchughes/bnpy-dev into bnpy-dev-slda
leahdw
slda helper
Mike Hughes
Merge branch 'bnpy-dev-slda' of https://bitbucket.org/michaelchughes/bnpy-dev into bnpy-dev-slda
leahdw
add grid3x3 dataset
Mike Hughes
Merge branch 'bnpy-dev-slda' of https://bitbucket.org/michaelchughes/bnpy-dev into bnpy-dev-slda
Mike Hughes
FIX updated formatting on slda branch
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
softmax theta transform
leahdw
gradient descent updates
leahdw
gradient descent
leahdw
gradient descent
leahdw
gradient descent
leahdw
updates
leahdw
updated regress-on-pi slda
leahdw
alt slda with my gradient descent
leahdw
edit decorate_for_profiling to allow nested functions
leahdw
hand coded gradient for L_theta
leahdw
faster resp update
leahdw
update
leahdw
update
leahdw
updates
leahdw
updates; slda regress on pi
leahdw
updates; slda regress on pi
leahdw
save eta parameter
leahdw
edit set_global_params
leahdw
edit set_global_params
leahdw
update
leahdw
optimized calc_EZTZ_one_doc
leahdw
working copy
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updates
leahdw
updated
leahdw
slightly edited and cleaner code
leahdw
sLDA per type update - working version with error in Elbo
leahdw
slda per type added
leahdw
slda per type added
leahdw
updated
leahdw
current working version
leahdw
Working slda version (fixes to come to bnpy interface part)
Leah Weiner
slda commit - no runtime errors, but other issues exits
Leah Weiner
updated WordData_slda and Bars2D
Leah Weiner
slda commit
Leah Weiner
Initial Commit
Leah Weiner
SLDA global step; fix in SLDA toy data
Leah Weiner
SLDA toy bars added
Leah Weiner
add BarsK10V900_supervised.py'
Leah Weiner
fist commit WordsData_slda
Leah Weiner
Added response variable to WordsData object to accomodate slda parameters
Mike Hughes
FIX WordsData .dim always set to .vocab_size. Also, InferHeldoutTopics prints useful info about local step params (which are always set to avoid restarts)
Mike Hughes
FIX Heldout metrics for topic models are working again. Fixed way-too-custom dependency on path name. Spruced up the logging functionality too, so it is saved to heldout-transcript-summary.txt and heldout-transcript-verbose.txt.
Mike Hughes
FIX Better error msg for LPkwargs access
Mike Hughes
FIX C++ code for local step doesn't do fixed active set inference when nnzPerRow==1
Mike Hughes
FIX Stupid bug where assertion was raised because forget to disregard -1 state in count
Mike Hughes
ENH Improvements to speed-up sparse local step for HDP topics, esp with Mult likelihoods. Code now amortizes the cost of sifting out the top L topics per token, and does some precomputation to avoid first dense iteration at each doc.
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
FIX updated births to work with sparse resp
Mike Hughes
ENH Makefile consistent use of DNDEBUG now
Mike Hughes
ENH sparse activeonly now successfully can do restarts, and record them to file
Mike Hughes
ENH Updated Topic Model local step with assignments only to active topics
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Big speedup of reading .ldac files (plain text format for bag-of-words)
Mike Hughes
ENH Improved speed of sparse restarts
Mike Hughes
FIX added small assert and comment, to help explain weird edge case in Breg div
Mike Hughes
ENH initname bregmankmeansWithPriorMean will now insert a value at the prior's expected suff stats
Mike Hughes
FIX entropy calculation with cython code now works for restricted inference, when Resp.ndim==1 (not 2)
Mike Hughes
FIX anchorwords init now can do K=1 initialization
Mike Hughes
ENH Added switch --doSparseOnlyAtFinalLP 0, which is 0 by default, and 1 if we desire sparsity only after the final step of local inference.
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
FIX HDP local step will not do sparseif nnzPerRow is equal to K (formerly was useful for debugging)
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
FIX makefile has eigenpath dependency now
Mike Hughes
ENH Can now do merges with HDP and DP models with sparse local assignments.
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Updated HDPTopicModel inference tools so can do sparse assignments, esp with WordCounts
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
FIX DPMixture use np.sum(Hresp) not Hresp.sum(), because Hresp might be a float in 1-sparse case
Mike Hughes
FIX Bregman init can now specify num iters with --initname bregmankmeans+0, and obsmodels correctly handle HDPTopicModel's lack of an N field
Mike Hughes
ENH Added c++ library for doing fast sparse-assignments in topic model case
Mike Hughes
ENH Updated Bern lik so that it (1) does everything from class-level function calcLocalParams and calcSuffStats, and (2) can use sparse or dense assignments in calcSuffStats. TODO: deal with sparse assignments when dataatoms are words
Mike Hughes
ENH DiagGauss lik now supports sparse vs dense summary calculation
Mike Hughes
FIX EMAlg works again
Mike Hughes
ENH Mult likelihood now supports (1) parallelism for both doc and word atom types, and (2) sparse or dense assignments
Mike Hughes
FIX improved performance for moderate nnz values (2, 3, ...) by avoiding subtracting max of each row from all K entries, only using the top nnz entries
Mike Hughes
FIX sparsifyLogResp needs to be able to safely take exp of any value. So, need to subtract the max in each row. TODO only subtract max from topL entries, not all values.
Mike Hughes
ENH Can now train DPMixtureModel with enforced sparsity (at most L of the K states can be non-zero for any data atom's assignments). Works with Gauss and ZeroMeanGauss currently. TODO: extend to Mult, Bern, and other likelihoods.
Mike Hughes
ENH Added SparseRespStatsUtil, for computing things like sum(r[n,k] * x[n]**2) or sum(r[n,k] * x[n] * x[n].T)
Mike Hughes
ENH added sparsifyResp_ functions, in util/SparseRespUtil.py. These will take a matrix and return a sparseified version of it (with at most nnzPerRow entries non-zero)
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Started work on trying sparser-versions of assignment distributions q(z)
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
FIX raised warning is now a proper warning taht doesnt halt execution
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Improved error messaging for issue with NaN in sparse restarts... todo find better,faster long-term solution
Mike Hughes
ENH Updated feature branch to latest stable less-memory-hungry fix recently applied to master
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Improved PlotParamComparison by making connected line through top-ranked (by elbo) of all tasks
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Updated init so can do bregmankmeans+0 to specify 0 iterations (with same after effects as randexamples), bregmankmeans+1 to do 1 iter, bregmankmeans+50 to do 50 iterations.
Mike Hughes
ENH PlotTrace.py Added kwarg drawLineToXMax that will continue all lines to the same maximum x value, so it is easier to compare them
Mike Hughes
ENH Updated PlotParamComparison to study differences between local optima
Mike Hughes
ENH Bernoulli can now take custom mean and scale, instead of lam1 and lam0
Mike Hughes
ENH Updated PrintTopics to improve the display of topics under Bern lik
Mike Hughes
FIX PlotComps for a topicmodel with associated vocab list now shows the top words
Mike Hughes
ENH Cleanup log messages for BPlanner.py
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Updated BPlanner to use batch-specific stats to decide which births to try. We only disqualify a birth if all batches report a uid as ineligible (due to small size or past failures)
Mike Hughes
ENH Trying new BPlanner, with better criteria for choosing which comps to target
Mike Hughes
ADDED util function for sorting with respect to tiers
Mike Hughes
ENH Improved data reader to load from minibatch dataset
Mike Hughes
ENH Added capability to enable/disable birth proposal retention AND birth proposal merge cleanups
Mike Hughes
ENH Added avgPi option to init xPi
Mike Hughes
ENH Updated Eval statement in births to be more readable/searchable as a one-line summary
Mike Hughes
ENH Improved speed of ZeroMeanGauss (got rid of forloop for DivDataVec). Also simplified convergence logic for Birth proposal
Mike Hughes
ENH Upgraded ZeroMeanGauss to use triangular solver, which seems to give noticeable performance boost (see TestZeroMeanGaussLocalStepSpeed.py to try it out on new hardware)
Mike Hughes
ENH Improved logging message formatting and hopefully better flushing to disk. Added option to only retain stats for next lap if there are two comps with nontrivial mass.
Mike Hughes
ENH Added method in TryBirth that will find comp best targeted by a birth move
Mike Hughes
FIX shuffle move respects the different truncation limits aggregated across batches
Mike Hughes
ENH Added Letters dataset. Updated BPlanner to avoid trying the same comp too many times on the first lap. Updated TryBirth to load specific batch file from disk for the interactive trial.
Mike Hughes
FIX hdp restricted step uses HrespEmptyComp now, as required to do multi batch calculations correctly
Mike Hughes
FIX elbo tracking for Hresp in dp models across multiple batches. todo: same for hdp
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
FIX small bookkeeping errors related to multiple batches. Also smaller args-ALG.txt error that printed whole dictionaries when should have printed nothing.
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Updated InferHeldoutTopics callback to handle general hmodel-based inference, not specific to mult topic models
Mike Hughes
ENH Added ability to init obsmodel from an hmodel object. Also fixed saveEvery behavior to work even with saveEvery < 1
Mike Hughes
FIX hdp restricted step for sparse bernoulli now properly deals with 'on' words and 'off' words in lumped fashion
Mike Hughes
ENH Updated bregman init and HDP restricted steps to support Bern HDP on sparse wordct daata
Mike Hughes
FIX DPMixtureRestrictedLocalStep now supports normalized_counts
Mike Hughes
FIX Small bug where used allclose(sum_minDiv, 0.0) when should have just tested if equal to zero, since sum_minDiv may be quite small but still positive (eg 1e-10)
Mike Hughes
ENH DiagGaussObsModel converted to use DivDataVec format. todo: why not use smoothFracInit??
Mike Hughes
ENH GaussObsModel converted to use DivDataVec format. todo: why not use smoothFracInit??
Mike Hughes
INPROGRESS Converting GaussObsModel to std format
Mike Hughes
ENH Improved ioutil to be robust for loading with K=1, and let InferHeldoutTopics know how to find heldout set from batches/Info.conf
Mike Hughes
MAINT cleanup errant print statements
Mike Hughes
ENH births and deletes seem to proceed without major bugs after reorg that does all restricted-local step work in allocmodel specific file.
Mike Hughes
INPROGRESS Defined BRestrictedLocalStep to unify functions that become allocmodel specific
Mike Hughes
ENH Birth moves now use random seed specific to current learn alg and the current lap. Before, just used the lap, which made tasks that used the same predefined batches yield the same output, which was lame.
Mike Hughes
ENH Updated TryBirth.py script to auto-load the exact kwargs for births specified by the saved job, and to update any kwarg options specified by command line, like --b_Kfresh 10
Mike Hughes
FIX BernObsModel had bad comparison of CompDims tuple to a string 'K' instead of tuple 'K,'... resulted in some bad ELBO computations. Now fixed.
Mike Hughes
ENH Updated RunBregKMeans testing script, so it can run desired test via stdin specification of N and K and D
Mike Hughes
ENH FromScratchBregman back to computing objective up to additive constant. Would need to to recompute DataDivVec using smoothFrac=0... smoothFracInit value does not work.
Mike Hughes
ENH Updated to use DivDataVec in computation of Ldata objective in FromScratchBregman
Mike Hughes
ENH Updated test for bregman to look at zmg
Mike Hughes
ENH Updated test for bregman to look at zmg
Mike Hughes
ENH Updated test for bregman to look at zmg
Mike Hughes
INPROGRESS ZeroMeanGauss calcSmoothedBregDiv updated to do faster DivDataVec computations
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
ENH Bern bregmankmeans now uses faster options, reusing DivDataVec when provided
Mike Hughes
ENH Updated FromTruth init to handle the bernoulli hdp and bernoulli dp words data cases
Mike Hughes
FIX Small bug in hasMoreReasonableMoves when the list of latestLapAccepted is empty, raised error when calling np.max(). Now fixed.
Mike Hughes
ENH Updated bnpy/allocmodels so that we support Bernoulli-backed HDP and DP models. Relevant changes: every place with a per-doc for-loop needs to change some access indices (since doc_range isnt quite right when all words present and absent are atoms). Also, entropy calculation doesn't need Data.word_count
Mike Hughes
ENH Updated BernObsModel to support WordsData, for both HDP (treating words as atoms) and DP (treating docs as atoms)
Mike Hughes
FIX Births work with multiple batches again. Updated BPlanner to better use retained uids.
Mike Hughes
Merge branch 'ENH-better-moves-with-relational' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-better-moves-with-relational
Mike Hughes
INPROGRESS towards Bern observation model for WordsData
Mike Hughes
ADD CountReader util to ioutil, which makes it possible (and easy) to plot the counts of each unique component throughout training.
Mike Hughes
ENH added R_precision metric
Mike Hughes
ENH Monks dataset now plots prettily, defaults to the esteem relation (which has most obvious block structure).
Mike Hughes
ENH Improved readability and maintainability of code for relational models. Also improved visualizations (use bnpy.viz.RelationalViz, not PlotComps for now)
Mike Hughes
INPROGRESS updating initialization for relational models
Mike Hughes
MERGE improved merge/birth/delete moves with relational (no moves yet)
Mike Hughes
ENH Updated allocation models for relational. Use standard notation of NodeStateCount, and (for assortative) we use entropy vector that is size K called Hresp_fg, plus a scalar Hresp_bg
Mike Hughes
FIX Run.py is safer about setting up files specific to BrownCS grid
Mike Hughes
FIX Removed errant embed statements
Mike Hughes
FIX MultObsModel has less strict check for values near zero in min Div calculation
Mike Hughes
ENH Fixed test for learning rho/theta from fixed doctopiccounts, so it works in many cases now. Some util methods appropriately integrated into bnpy.util
Mike Hughes
FIX HDPTopicModel global update now allows sortorder=[1,2,0,4] kwarg
Mike Hughes
ENH Big improvements after experiments on sorting order of HDPtopic models. forceRhoInBounds avoids weird artifacts happening in beta space. Optimizer will always verify that final output improves on initialization. tests/allocmodel/topics/ contains several scripts for diagnosing issues in learning rho and theta given fixed DocTopicCounts.
Mike Hughes
ENH small readability fix for OptimRhoOmegaBetter
Mike Hughes
FIX OptimizerRhoOmega now uses None values as defults for sumLogPi stats, not zeros.
Mike Hughes
ENH LearnAlg logic for callbacks improved.
Mike Hughes
ADD callbacks/ module. This is where we put code that we call via --customFuncHook <filename.py>. Should be useful.
Mike Hughes
ENH MultObsModel bregman divergence calculation is now way faster than before. Speedup comes from computing DivDataVec once and reusing it during plusplus initialization, and then in refinement iterations using --includeOnlyFastTerms, which avoids the data term DivDataVec.
Mike Hughes
MAINT Improved docs for NumericUtil inplaceLog
Mike Hughes
FIX profiler users /path/to/profiler/, not /profile/, which makes css work correctly. Also fixed some issues so we can specify nLap=0 to just do initialization.
Mike Hughes
FIX wrong option specified in moves.conf. Fixed.
Mike Hughes
ENH MemoVBMovesAlg edited so shuffle uses sumLogPi vector as cue for ordering, *not* the counts directly. We've found some examples where shuffling by counts alone yields noticeably lower objective values.
Mike Hughes
FIX OptimizerRhoOmegaBetter had stupid mistake in computing f_safe (final objective value using safe rho/omega, guaranteed in bounds). Used omega where should have used rho. Corrected. Hopefully no more overflow errors.
Mike Hughes
ENH LocalStepManyDocs now avoids a copy operation to make the resp field for whole datasets.
Mike Hughes
FIX ioutil loadModelForLap fixed so even if only one Lap___.mat file exists on disk, saveLaps will be loaded as a 1-d array, not 0-dim.
Mike Hughes
ADD SLogger to track log for shuffle moves. Each run with shuffles enabled now prints logged messages to shuffle-transcript-verbose.txt
Mike Hughes
FIX edge case where bregman init encounters data with duplicate rows. Use only one cluster for each set of duplicates, shrinking from specified number of clusters if needed.
Mike Hughes
INPROGRESS Memo alg uses newer restricted local step
Mike Hughes
ENH Improved birthmoves. TryBirth now *correctly* reads float params from args-memoVB.txt, and tracks how long the local coordascent iters take.
Mike Hughes
ENH calc_local_params takes relevant args so localstep logging happens.
Mike Hughes
ENH added kwarg option --b_method_initCoordAscent that enables restarting each restricted local step from previous output, if set to 'fromprevious'
Mike Hughes
FIX wayward embed statements from obsmodels
Mike Hughes
ENH TryBirth now pretty functional, tested on nips/bars/asterisk. Also, added debugging step that computes ELBO for the initial hard assignments. Need to investigate what is going on here more.
Mike Hughes
FIX Removed wayward embed statement in Memo alg
Mike Hughes
FIX GaussViz now doesn't require a valid allocModel defined along with the obsModel, but will use if it can.
Mike Hughes
ENH Added TryBirth script that will perform birth move (including all HTML output and all logging, if desired) on demand for specific model or saved run.
Mike Hughes
ENH Added TryBirth.py, a script for interactively trying the birth move and diagnosing issues.
Mike Hughes
ENH Added DataReader.py, which allows an easy function to load the training data specifically used for a saved task.
Mike Hughes
ENH Added special file for HDPTopicRestrictedLocalStep. Improving BCreate by removing junk code.
Mike Hughes
ENH Added new and improved OptimizerRhoOmegaBetter. Now, we always fix the omega vector and just optimize the rho vector. Should be more stable and equally high quality. Tests look good so far...
Mike Hughes
ENH Updated memoVB alg to better track pieces of birth objective
Mike Hughes
INPROGRESS Updating rhoomega updating
Mike Hughes
ENH Added better logging for delete move Lterms. Also planted early stage code for using reconfigure-word type moves in the delete.
Mike Hughes
ENH BRefine (restricted step) can now take optional absorbingIDs, which bias the prior towards the document-topic distribution used by non-target atoms in the document.
Mike Hughes
ENH Improved deletemove logs so more easily searched for targetUID (no whitespace to deal with)
Mike Hughes
FIX HDP Topics sign error in calcHrespForMergePairs, where we didn't multiply by neg. one when should have.
Mike Hughes
FIX delete moves now use guaranteed never-before-used UIDs, instead of hack olduid+1000
Mike Hughes
FIX should not retain birth proposal at last batch of a lap.
Mike Hughes
FIX BRefine.py now properly handles case where no new comp has significant mass, by assigning all mass to the first comp.
Mike Hughes
FIX Delete move once again conforms to desired single-line-message summary per lap.
Mike Hughes
FIX BLogger once again creates a one-line summary entry after every lap.
Mike Hughes
FIX Two important fixes, which lead to correct recovery in BinaryBarsK20 dataset. First, we only penalize shortlisted UIDs as failures if they are ineligible at all batches. They may just not be chosen due to budget constraints which is fine. Second, we fix the way that failed local step proposals are tracked, so it is consistent even across batches.
Mike Hughes
ENH Improved birth planner to avoid uids actively used by merge or delete.
Mike Hughes
FIX Entropy for HDP with Gaussians is now properly accounted for.
Mike Hughes
FIX HDP restricted local step improved, by handling atoms only with significant mass (>0.01) explicitly, and leaving the rest as a lump placed with largest topic in that doc.
Mike Hughes
INPROGRESS Improving weird bug in refinement iterations
Mike Hughes
FIX Small issues that prevented births with Gaussian likelihoods
Mike Hughes
ENH Fixed bregman computations for Bern and Mult, added functionality for DiagGauss.
Mike Hughes
ENH RunBregmanKMeans.py tested and works with simple toy datasets
Mike Hughes
ENH Gauss bregman kmeans finally passes tests.
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
Mike Hughes
ENH Improved tests for bregman kmeans for FixedVarGauss and Gauss
Mike Hughes
INPROGRESS Gauss kmeans with bregman
Mike Hughes
INPROGRESS testing gaussian computations.
Mike Hughes
Merge branch 'ENH-birth-selection' of https://bitbucket.org/michaelchughes/bnpy-dev into ENH-birth-selection
111 commits not shown.