crash - can not reproduce - and ASAN experience

Issue #80 resolved
Danny Heijl created an issue

After some editing on a single RAW CR3 image, I played a bit with area masks, and then went to perspective correction and boom. Unfortunately not in the debug build and I can’t reproduce the crash.

So I did a debug build with the address sanitizer (WITH_ASAN=address), but ART didn’t even start because of a delete type mismatch. So I ignored that type of error but then ART did not start with a use after free in the thumbnail viewer.

I attach the ASAN results.

Question: do you find it useful if I use ASAN to try to solve these memory allocation/addressing errors and report my findings back here?

Comments (18)

  1. agriggio repo owner

    surely useful! I don't have much experience with it though, does it give false positives or is it precise? I tried running on valgrind recently but it did not find anything worth investigating…

  2. Danny Heijl reporter

    In my (limited) previous experience with ASAN I did not get any false positives. If I find anything useful I will report back here.

  3. Danny Heijl reporter

    The problem is in line 489 in rtengine/labmasks.cc:

        if (show_mask_idx >= n || !masks[show_mask_idx].enabled) {
            show_mask_idx = -1;
        }
    

    on entry n = 1 and show_mask_idx = -1, so masks[-1] gets hit, causing the ASAN alert.

    When I change this to:

        if (show_mask_idx != -1) {
            if (show_mask_idx >= n || !masks[show_mask_idx].enabled) {
                show_mask_idx = -1;
            }
        }
    

    ASAN no longer complains. and ART starts up. I’m going to test some more with ASAN enabled now.

    I’m going to ignore the “delete of inherited class through base class pointer where the base class destructor is not virtual” for now, it might be harmless (but the standard says the behaviour is undefined).

    I also disabled tcmalloc in build-art because I read that using tcmalloc or jemalloc is not allowed when using ASAN, and I don’t know if this still true.

    EDIT: this has apparently been fixed in current builds.

  4. Danny Heijl reporter

    Did some testing with ASAN and so far the only error I got was a heap buffer overrun when closing down ART (see ASAN2.pdf for the stacktrace and ASAN report).

    This looks more difficult to figure out… It looks like the EditDataProvider has already been destroyed (ImageArea destructor) when it gets referenced further on in the Spot::on_hide() callback set in the Spot constructor.

    0x6140000cf270 is located 48 bytes inside of 440-byte region [0x6140000cf240,0x6140000cf3f8)
    freed by thread T0 here:
    #0 0x7ffff6efb2c0 in operator delete(void*) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe12c0)
    #1 0x55555645384b in ImageArea::~ImageArea() /home/danny/programs/code-art/rtgui/imagearea.cc:72
    #2 0x55555645df28 in ImageAreaPanel::~ImageAreaPanel() /home/danny/programs/code-art/rtgui/imageareapanel.cc:43
    #3 0x55555645dfab in ImageAreaPanel::~ImageAreaPanel() /home/danny/programs/code-art/rtgui/imageareapanel.cc:44
    #4 0x555556156678 in EditorPanel::~EditorPanel() /home/danny/programs/code-art/rtgui/editorpanel.cc:857
    #5 0x555556157633 in EditorPanel::~EditorPanel() /home/danny/programs/code-art/rtgui/editorpanel.cc:932
    #6 0x7ffff58c899a in g_datalist_clear (/usr/lib/x86_64-linux-gnu/libglib-2.0.so.0+0x2e99a)

    previously allocated by thread T0 here:
    #0 0x7ffff6efa448 in operator new(unsigned long) (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xe0448)
    #1 0x55555645d88f in ImageAreaPanel::ImageAreaPanel() /home/danny/programs/code-art/rtgui/imageareapanel.cc:24
    #2 0x55555614c0d6 in EditorPanel::EditorPanel(FilePanel*) /home/danny/programs/code-art/rtgui/editorpanel.cc:574
    #3 0x5555567980eb in RTWindow::createSetmEditor() /home/danny/programs/code-art/rtgui/rtwindow.cc:1145
    #4 0x55555678b5b3 in RTWindow::RTWindow() /home/danny/programs/code-art/rtgui/rtwindow.cc:370
    #5 0x555556504c1e in create_rt_window /home/danny/programs/code-art/rtgui/main.cc:257
    #6 0x555556507fb0 in main /home/danny/programs/code-art/rtgui/main.cc:566
    #7 0x7ffff11a5b96 in __libc_start_main (/lib/x86_64-linux-gnu/libc.so.6+0x21b96)

    Not sure I’m up to figuring this one out.

  5. Danny Heijl reporter

    The crash might be related, but the use after free in the onhide callbacks when closing ART is totally unrelated…

  6. Danny Heijl reporter

    Although I think a better solution could be to check out why setProvider is used in the way it is, causing the problem in the first place, and fix that instead.

    I fixed it the bottom up way I’m afraid.

  7. agriggio repo owner

    Hi @Danny Heijl , why can’t you just call EditSubscriber::setEditProvider(nullptr) from ~EditDataProvider instead of adding the extra flag and method?

  8. Danny Heijl reporter

    That was the first thing I tried and it didn’t work because the currSubscriber in EditDataProvider was already gone. Then I added the flag and it worked. The clear is still there in ~EditDataProvider. I didn’t try to work out how it all ties together, never liked GUI programming.

  9. Danny Heijl reporter

    I suppose I could clear the provider in EditSubscriber:: unSubscribe instead of using the flag if I was sure that Subscribe is not going to be called again without setting the provider again.

  10. agriggio repo owner

    nope. that wouldn’t work. But I’ve already simplified your patch a little bit, I’ll push shortly. Thanks!

  11. Log in to comment