NS_Node, do not cache lft/rgt on insert

Issue #29 open
Sardar Yumatov
created an issue

When men tries to create a tree: {{{ - create root -- keep the object - add child 1 -- root became non-leaf - add sub-child -- root has now out-dated rgt

 - add child 2         -- will go wrong from this point
    - add sub-child


This scenario occurs when men tries to insert a whole subtree at once.

The problem is in NS_Node.get_tree(), which uses outdated lft/rgt. The method is used by get_last_child() > get_children() > get_descendants(), which is used by add_child when root node is not leaf anymore (after first insert, we update rgt + 2). Because rgt is outdated we will not get the whole tree, so get_last_child() will return the wrong node.

Easy fix: explain this in the documentation and mention explicitly that every operation that modifies the tree must fetch a fresh parent node just before the operation (fetch parent, add child, throw reference to parent away).

Better fix: introduce builder object, used to create larger subtrees at once. {{{ node.builder().add_child(params)\ .add_child(params).pop()\ .add_child(**params)\ .commit() }}} Every method returns a builder instance with only add_child(), add_sibling(), pop() (return parent builder) and commit() methods. The commit() will shift rgt's at once and create the complete list of objects. Builder covers only tree creation, updates/deletions are perfect as they are right now. The documentation should mention that whole (sub)trees must be created via builders only.

Even better fix: remove all individual add_*() methods from nodes, inserts should be performed through builders only. At commit use fresh lft/rgt/depth/tree_id. Again, move and delete are individual operations on nodes.

The worst possible fix: keep track of all instantiated objects and update lft/rgt's. Please, do not do this.

If you like the idea, I can implement this and send the patch.

Comments (7)

  1. Gustavo Picón repo owner


    Interesting proposal. I actually went for the "Easy fix", you can see a big red warning in https://tabo.pe/projects/django-treebeard/docs/1.61/api.html and an explanation with code in https://tabo.pe/projects/django-treebeard/docs/1.61/intro.html#basic-usage

    About the builder, I think it adds confusion to the API. The common case isn't actually code like in your example, but something like this:

    new_node = node.add_chile(**parms)

    That's why I added the warning, since 1) there is no identity mapper in django and 2) keeping track of all instantiated objects would suck (that would involve pretty much building an identity mapper for treebeard).

    The functional approach you propose looks interesting for methods like load_bulk. I wonder how do you propose to implement it? Keeping all the node data in memory and then commiting (harder/faster)? Or commit every single step as part of a larger transaction and let the DBMS do the heavy lifting (easier/slower)?

    Note that this would need to work with MP and AL trees as well, at least with a generic/naive/slow implementation (like the stuff you can find in models.py).

  2. Sardar Yumatov reporter

    Keeping all the node data in memory and then commiting (harder/faster)?

    Yes. Builder is a simple object (collections.namedtuple) that is instantiated on an existing node. Men can add left/right siblings or children, this will create new builder node. Navigation methods:

       pop() -- get parent builder/node
       left()/right() -- get left/right sibling builder/node, raising exception if no
                      -- such note was created

    commit() is only available on the initial level (the builder created with the real node or sibling builder). Men obtains the node, get_builder() -> new builder instance. Builds the subtree and/or sibling trees and pop() to the initial level. Then commit(). The builder will insert all the nodes relative to the initial node.

    It is not possible to commit part of the builder tree:

          - node -> builder:
              - new_child
                 - new_sub_node
              - new_child1  -- here commit will raise exception
          * pop() until initial level, then commit

    It is not possible to jump over existing nodes:

          - node -> builder:
          - new right sibling
            - new sub-node  * pop()
          * commit()
          - node2  * will be shifted right

    Multiple builders do not care about each other, they can not conflict. The same builder committed twice will insert subtrees twice.

    So: the builder is just add_* recorded and executed efficiently (if possible). No shared state, builder is just like a wedge inserts itself adjacent to the node.

  3. Gustavo Picón repo owner

    I've been thinking about this and came to the conclusion that it's a great idea. Would you be willing to write a patch for this that is tree-independent? (meaning that it works with AL/NS/MP).

  4. Log in to comment