FiniteHMT and Toy Data

#10 Open
Repository
Branch
bnpy-hmt
Repository
Branch
master

Bitbucket cannot automatically merge this request.

The commits that make up this pull request have been removed.

Bitbucket cannot automatically merge this request due to conflicts.

Review the conflicts on the Overview tab. You can then either decline the request or merge it manually on your local system using the following commands:

git checkout master
git merge --no-ff -m 'Merged in bnpy-hmt (pull request #10)' remotes/origin/bnpy-hmt
Author
  1. Mert Terzihan
Reviewers
Description

This pull request includes FiniteHMT (Sum-product algorithm for quad trees) and toy data to test it.

Comments (6)

  1. Mike Hughes repo owner

    Can you provide some example code that could be run to demonstrate this algorithm?

    Also, a verbal summary of the toy dataset would be good. Pictures / graphs would be better.

  2. Mike Hughes repo owner

    Thanks for the hard work Mert! We'll meet about the toy data early next week.

    After looking over the code, it seems that there are some redundancies... both QuadTreeUtil.py and HMTUtil have implementations of the sum-product algorithm.

    Ideally, for each type of tree, we should have a file called (Util<TreeType>.py), which contains a correct, fast pure Python implementation of sum-product, and a brute-force method for reference. This way, when we extend our implementation to general trees or other possible tree structures, it will be obvious how those fit in.

    Please clean up this code so that we have one Util file for Quad and one file for Binary trees.

    Also, please commit the unit tests you've used to verify that the sum-product algorithm is working in each case.

  3. Mert Terzihan author

    Thanks Mike for the comments. Before meeting today, I wanted to make some comments on your previous inquiries. HMTUtil module is the main module that I was working on for a general sum-product implementation on HMTs. On the other hand, QuadTreeUtil is a module where I did my unit tests. I didn't want to add brute force method to the actual module for avoiding any possible complication. The unit test that I used for testing QuadTree module resides in tests/allocmodel/tree/TestQuadTreeUtil.py, and the script to generate the toy data is in datasets/HMTK4.py.

  4. Mike Hughes repo owner

    OK, that all makes sense. Here's the structure I'd like to see before accepting a pull request (should be easy, just rename a few files and possibly copy-paste relevant methods from one file to another):

    Within bnpy/allocmodel/tree/

    • UtilQuadTree.py has a brute-force and a sum-product implementation for quad trees
    • UtilBinTree.py has a brute-force and a sum-product implementation for binary trees
    • UtilGeneralTree.py has a sum-product implementation for general trees (do you really have a general implementation?)

    Within tests/allocmodel/tree/

    • TestUtilQuadTree.py has any code that compares brute-force and a sum-product from UtilQuadTree.py to verify same output
    • TestUtilBinTree.py has any code that does the same for UtilBinTree.py

    I'm trying hard to make it very obvious to a person new to the project to look at the code and understand exactly where each piece of the algorithm is located, and how different special tree structures (binary vs general) are handled. This reorganization will really help with that.

  5. Mike Hughes repo owner

    From the current code, it does look like there is no "general" implementation, just a quad tree implementation. So, I'd say there's no need for two separate files QuadTreeUtil and HMTUtil... merging them into one file that does the right thing will be very useful.

  6. Mert Terzihan author

    Sure, Mike, I will try to structure the modules based on your comments. Currently, I don't have a general tree implementation. I had some ideas on how to implement it, but we can discuss it today. It would be easy to implement a general tree structure, however with the current setup (data is stored as a numpy array), it can be inefficient to apply the current code to a general case with incomplete trees where number of children of each node varies.