Issue #7 resolved

FastRBTree Error using Tuple-Keys

Peter Mell
created an issue

This is likely the same underlying error as in issue #6. The error code of issue #6 can be fixed by using tuple-keys instead of list-keys. However, attached is code that causes use of FastRBTree with Python to crash while using tuple-keys. As with issue #6, the problem only exhibits itself when using the "fast" variant of the trees. I tried hard to reduce my error code down as small as possible while still demonstrating the bug (see attached 54 lines of code). Note that the code contains two tree creation lines (8 and 31). For some reason, it doesn't seem to matter whether or not line 31 uses the "fast" trees or not. Toggling line 8 between FastRBTree and RBTree will demonstrate the error when using the fast variant. This was tested on Windows with Python 2.7.3 and bintrees 1.0.3. Thanks in advance for your help!!

Comments (13)

  1. Manfred Moitzi repo owner

    Hi, I can not reproduce the error on Windows with Python 2.7.5 or Linux Mint with Python 2.7.4. The program completes all 10 iterations without crash. Sorry!

    Edit: It crashed with Python 3.3 after the last iteration.

  2. Manfred Moitzi repo owner

    I don't know if I will find the bug, because it seems to be somewhere at the C level. It seems that the crash happens at tree destroying, which is implemented by recursive function calls, so maybe it is just a stackoverflow.

    Perhaps you can try another RBTree implementation, implemented in C and pyrex: https://bitbucket.org/bcsaller/rbtree/

  3. Peter Mell reporter

    Thanks for looking into this. I've tried reusing trees as opposed to creating and deleting tens of thousands of trees in each iteration. This seems to help but the code still does crash sometimes. For now, using the C library for my one main tree and the Python library for my tens of thousands of usually smaller trees works well (with a penalty to runtime of course).

  4. Manfred Moitzi repo owner

    I started refactoring of the Cython code, removing duplicate code and reducing complexity (Walkers exposes the internal C data types - which could be a source of problems). The tree functions are fixed for a long time, so I can optimize the implementation, from a lack of time I can not say when this process will be finished.

  5. Manfred Moitzi repo owner

    A first little step: insert @bintrees_FastRBTree_error2.py

    import gc  # <--------
    
    # rest of your code
    
        while not hypersize_list.is_empty():
            (key,hyperedge)=hypersize_list.pop_max()
            hyperedge.alerts.foreach(treeloop)
        gc.collect()  # force garbage collection <--------
        print "Completed iteration",z,"of 10"
    

    No crash happens, but I don't know why?

    Edit: It also works at the end of the file. All tests with Python 3.3 x86 on Win 7 x64.

  6. Peter Mell reporter

    I tried gc.collect() in your specified location and I still have crashes. However, if I put the gc.collect slightly deeper into the loop nesting, it eliminates the crashes (while increasing runtime significantly). You had a nice idea to try gc.collect(). Fascinating!!

        while not hypersize_list.is_empty():
            (key,hyperedge)=hypersize_list.pop_max()
            gc.collect() # force garbage collection <--------
            hyperedge.alerts.foreach(treeloop)
        print "Completed iteration",z,"of 10"
    
  7. Peter Mell reporter

    I also found another approach that works (with an interesting side-effect). Instead of using gc.collect(), begin the program with gc.disable(). The code will run without crashing. The side-effect is that if, using the same Windows Python shell, you run the program a second time it will crash. It is as if garbage collection gets turned on between runs, messing things up.

    Edit: While it won't crash this way during execution, when I go to close the Python shell window, I then sometimes get the error message even after just a single run.

    import gc
    from random import randint
    import bintrees
    
    gc.disable()
    
  8. Manfred Moitzi repo owner

    Next step - wrapped low level stack functions into a Cython extension object, which guarantees correct deallocation in a generator function (iter_items). No crashes with Python 3.3 and 2.7 until now ;-). Uploaded source and binary install packages for Windows (Version 2.0.0b2). For consistent names, I renamed some methods like itemslice() to item_slice(). Please test it.

  9. Peter Mell reporter

    I tried out FastRBTree with the new bintrees 2.0.0b2 on both Python 2.7 and 3.3. The code we've been discussing on this thread doesn't crash. I then tried it out with my full program on Python 2.7 and had no crashes. Furthermore, I tried to heavily load bintrees with 100 iterations of creating, using, and deleting up to 1.5 million trees at a time. It works perfectly!! I also can't detect any slow down from the previous version (if there is any). Thank you so much!! I'm very excited about this!!

  10. Log in to comment