Snippets

Damien Caliste Crashes with lipstick-ui-security

Created by Damien Caliste
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0xb05303a0 (LWP 16741)]
0xb1f4a398 in WaylandNativeWindow::finishSwap() () from /usr/lib/libhybris/eglplatform_wayland.so
(gdb) bt
#0  0xb1f4a398 in WaylandNativeWindow::finishSwap() () from /usr/lib/libhybris/eglplatform_wayland.so
#1  0xb21a8d62 in _my_eglSwapBuffersWithDamageEXT () from /usr/lib/libEGL.so.1
#2  0xb21aa138 in eglSwapBuffers () from /usr/lib/libEGL.so.1
#3  0xb21c4916 in ?? () from /usr/lib/qt5/plugins/wayland-graphics-integration-client/libwayland-egl.so
#4  0xb21c4916 in ?? () from /usr/lib/qt5/plugins/wayland-graphics-integration-client/libwayland-egl.so
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

Comments (24)

  1. Damien Caliste

    The bug is not reproductible at each time. Here were the steps that I follow before having the issue:

    • stop the service with systemctl --user stop lipstick-ui-security
    • rerun it in gdb with gdb /usr/libexec/lipstick-ui-security
    • (iterate the following steps a lot of time, some time bug happens, sometimes not I had it four times today within a short period of time and then nothing anymore)
    • write an email and sign it, password system window should open when sending it. Or sign a file with gpg2 -s foo
    • enter the password and validate, the mail should be sent, or the signature created for the file foo.
    • check gdb, the process may have crashed with the above (partial) backtrace.

    I had once also, when the process finished cleanly after the system window disappeared, with the following warning: [W] unknown:0 - The Wayland connection broke. Did the Wayland compositor die?

  2. Damien Caliste

    I got also the same issue from time to time using simply secrets-tool --get-user-input

  3. Damien Caliste

    Doing repeated secrets-tool --get-user-input calls from command line and typing cancel button allow to make the lipstick-security-ui process crash or hang. Here is the thread profile from gdb of a hanging process:

    (gdb) info threads
      Id   Target Id         Frame 
      10   Thread 0xafef83a0 (LWP 14890) "GL updater" __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:43
      9    Thread 0xb05893a0 (LWP 14876) "QSGRenderThread" 0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
      7    Thread 0xb09853a0 (LWP 14830) "GL updater" __libc_do_syscall () at ../ports/sysdeps/unix/sysv/linux/arm/libc-do-syscall.S:43
      6    Thread 0xb23ff3a0 (LWP 14827) "gdbus" 0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
      5    Thread 0xb27eb3a0 (LWP 14826) "gmain" 0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
      4    Thread 0xb2c163a0 (LWP 14825) "dconf worker" 0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
      3    Thread 0xb32bf3a0 (LWP 14820) "QDBusConnection" 0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
      2    Thread 0xb3e673a0 (LWP 14819) "QQmlThread" 0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
    * 1    Thread 0xb421e000 (LWP 14818) "lipstick-securi" __libc_do_syscall ()
    

    Then, looking at the backtrace of thread 9:

    (gdb) bt
    #0  0xb602e170 in poll () at ../sysdeps/unix/syscall-template.S:81
    #1  0xb3ee78f8 in wl_display_dispatch_queue () from /usr/lib/libwayland-client.so.0
    #2  0xb1e81bb8 in WaylandNativeWindow::readQueue(bool) () from /usr/lib/libhybris/eglplatform_wayland.so
    #3  0xb1e8236e in WaylandNativeWindow::finishSwap() () from /usr/lib/libhybris/eglplatform_wayland.so
    #4  0xb25d8d62 in _my_eglSwapBuffersWithDamageEXT () from /usr/lib/libEGL.so.1
    #5  0xb25da138 in eglSwapBuffers () from /usr/lib/libEGL.so.1
    #6  0xb291c916 in ?? () from /usr/lib/qt5/plugins/wayland-graphics-integration-client/libwayland-egl.so
    #7  0xb291c916 in ?? () from /usr/lib/qt5/plugins/wayland-graphics-integration-client/libwayland-egl.so
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    

    The system window has disappeared but when calling secrets-tool --get-user-input again, nothing happen on the screen. Mainly due, IMHO, because the previous dialog is still no finalised and hanging in a transient state in the finishSwap call.

  4. Damien Caliste

    I may add that on normal operation, when the system window is shown, there is only one GL updater thread.

  5. Chris Adams
    1) if you attempt to reproduce slowly, e.g.:
     - run `secrets-tool --get-user-input` from command line
     - wait for the ui dialog to show
     - press "cancel" once exactly
     - wait for the secrets-tool process to complete
     -> repeat above 4 steps over and over again
    
    I could NOT reproduce.  BUT:
    
    2) if you attempt to reproduce quickly, e.g.:
     - run `while true ; do secrets-tool --get-user-input; done` from command line
     - wait for the ui dialog to show
     - look to see where the "cancel" button is
     - quickly press that area of the screen repeatedly for a few seconds
    
    then I could reproduce almost immediately.
    It was much simpler with valgrind due to the slowness.
    
    What I suspect is happening is that some touch event is being processed which triggers the error condition, although I'm not entirely sure what / why / how
    
  6. Chris Adams

    With valgrind I get:

    ==8078== Mismatched free() / delete / delete []
    ==8078==    at 0x48418EC: operator delete[](void*) (vg_replace_malloc.c:621)
    ==8078==    by 0x7B16EFD: ??? (in /usr/libexec/droid-hybris/system/lib/libEGL.so)
    ==8078==    by 0x7B16F73: ??? (in /usr/libexec/droid-hybris/system/lib/libEGL.so)
    ==8078==    by 0x7B16C83: ??? (in /usr/libexec/droid-hybris/system/lib/libEGL.so)
    ==8078==    by 0x7B18F83: eglDestroySurface (in /usr/libexec/droid-hybris/system/lib/libEGL.so)
    ==8078==    by 0xBC610C5: eglDestroySurface (in /usr/lib/libEGL.so.1.0.0)
    ==8078==    by 0x9A66105: QWaylandGLContext::updateGLFormat() (qwaylandglcontext.cpp:352)
    ==8078==    by 0x9A6640F: QWaylandGLContext::QWaylandGLContext(void*, QWaylandDisplay*, QSurfaceFormat const&, QPlatformOpenGLContext*) (qwaylandglcontext.cpp:297)
    ==8078==    by 0x9A65D15: QWaylandEglClientBufferIntegration::createPlatformOpenGLContext(QSurfaceFormat const&, QPlatformOpenGLContext*) const (qwaylandeglclientbufferintegration.cpp:121)
    ==8078==    by 0x7D61AAB: QWaylandIntegration::createPlatformOpenGLContext(QOpenGLContext*) const (qwaylandintegration.cpp:179)
    ==8078==    by 0x4D59A0B: QOpenGLContext::create() (in /usr/lib/libQt5Gui.so.5.6.3)
    ==8078==  Address 0xb1f7b40 is 0 bytes inside a block of size 28 alloc'd
    ==8078==    at 0x483FD5C: operator new(unsigned int) (vg_replace_malloc.c:328)
    ==8078==    by 0x7B1815D: eglCreateWindowSurface (in /usr/libexec/droid-hybris/system/lib/libEGL.so)
    ==8078==    by 0xBC60F99: eglCreateWindowSurface (in /usr/lib/libEGL.so.1.0.0)
    ==8078==    by 0x9A660ED: QWaylandGLContext::updateGLFormat() (qwaylandglcontext.cpp:313)
    ==8078==    by 0x9A6640F: QWaylandGLContext::QWaylandGLContext(void*, QWaylandDisplay*, QSurfaceFormat const&, QPlatformOpenGLContext*) (qwaylandglcontext.cpp:297)
    ==8078==    by 0x9A65D15: QWaylandEglClientBufferIntegration::createPlatformOpenGLContext(QSurfaceFormat const&, QPlatformOpenGLContext*) const (qwaylandeglclientbufferintegration.cpp:121)
    ==8078==    by 0x7D61AAB: QWaylandIntegration::createPlatformOpenGLContext(QOpenGLContext*) const (qwaylandintegration.cpp:179)
    ==8078==    by 0x4D59A0B: QOpenGLContext::create() (in /usr/lib/libQt5Gui.so.5.6.3)
    

    Note that to use valgrind i had to chmod -s /usr/libexec/lipstick-security-ui*, then run valgrind with: devel-su XDG_RUNTIME_DIR=/run/user/100000 valgrind /usr/libexec/lipstick-security-ui

    1. Chris Adams

      That was maybe a red herring… Although it would be good to figure out precisely what is happening and fix that, it turns out that the same is often seen in non-crashing applications etc. Maybe just a bug causing a leak.

      My next plan of attack was to use helgrind to figure out what is wrong, but I get:

      [root@Sailfish nemo]# XDG_RUNTIME_DIR=/run/user/100000 valgrind --tool=helgrind -v /usr/libexec/lipstick-security-ui
      ==15697== Helgrind, a thread error detector
      ==15697== Copyright (C) 2007-2017, and GNU GPL'd, by OpenWorks LLP et al.
      ==15697== Using Valgrind-3.13.0 and LibVEX; rerun with -h for copyright info
      ==15697== Command: /usr/libexec/lipstick-security-ui
      ==15697==
      --15697-- Valgrind options:
      --15697--    --tool=helgrind
      --15697--    -v
      --15697-- Contents of /proc/version:
      --15697--   Linux version 3.10.49+0.0.86 (abuild@phost8) (gcc version 4.8 (GCC) ) #1 SMP PREEMPT Tue Oct 2 15:00:13 UTC 2018
      --15697--
      --15697-- Arch and hwcaps: ARM, LittleEndian, ARMv7-neon-vfp
      --15697-- Page sizes: currently 4096, max supported 4096
      --15697-- Valgrind library directory: /usr/lib/valgrind
      --15697-- Reading syms from /usr/libexec/lipstick-security-ui
      --15697--   Considering /usr/lib/debug/.build-id/e7/cc947d94b643532d1e91bcfcef7ad482a23bab.debug ..
      --15697--   .. build-id is valid
      ... random stuff
      --15142-- REDIR: 0x58d5250 (libc.so.6:memset) redirected to 0x484b3d0 (memset)
      --15142-- REDIR: 0x58db480 (libc.so.6:memcpy) redirected to 0x4849ea0 (memcpy)
      --15142-- REDIR: 0x58d4091 (libc.so.6:rindex) redirected to 0x48483a0 (rindex)
      --15142-- REDIR: 0x58d3d41 (libc.so.6:strlen) redirected to 0x48488fc (strlen)
      --15142-- REDIR: 0x58d3f49 (libc.so.6:strncmp) redirected to 0x4848d14 (strncmp)
      --15142-- REDIR: 0x58cfbc5 (libc.so.6:malloc) redirected to 0x4840670 (malloc)
      --15142-- REDIR: 0x58d0445 (libc.so.6:calloc) redirected to 0x4842e94 (calloc)
      ==15142== 
      ==15142== Process terminating with default action of signal 11 (SIGSEGV): dumping core
      ==15142==  Bad permissions for mapped region at address 0xD4EE48A0
      ==15142==    at 0xD4EE48A0: ???
      ==15142== 
      ==15142== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)
      Segmentation fault (core dumped)
      

      I tried to attach gdb to the valgrind instance to figure out what was happening:

      [root@Sailfish nemo]# gdb /usr/libexec/lipstick-security-ui
      (gdb) target remote | /usr/lib/valgrind/../../bin/vgdb
      Remote debugging using | /usr/lib/valgrind/../../bin/vgdb
      relaying data between gdb and process 16362
      Program received signal SIGSEGV, Segmentation fault.0x5dad7da4 in ?? ()
      (gdb) bt
      #0  0x5dad7da4 in ?? ()
      #1  0x0589746e in set_binding_values (codesetp=0x0, dirnamep=<synthetic pointer>, domainname=0x5a67a9c "libgpg-error") at bindtextdom.c:117
      #2  __bindtextdomain (domainname=0x5a67a9c "libgpg-error", dirname=0x5a67aac "/usr/share/locale") at bindtextdom.c:348
      #3  0x05a5f42e in ?? () from /usr/lib/libgpg-error.so.0
      #4  0x0400c8c6 in call_init (env=<optimized out>, argv=<optimized out>, argc=<optimized out>, l=<optimized out>) at dl-init.c:78
      #5  _dl_init (main_map=0x402a908, argc=1, argv=0xbdc95864, env=0xbdc9586c) at dl-init.c:126
      #6  0x04000bf2 in _dl_start_user () from /lib/ld-linux-armhf.so.3
      #7  0x04000bf2 in _dl_start_user () from /lib/ld-linux-armhf.so.3
      Backtrace stopped: previous frame identical to this frame (corrupt stack?)
      

      Interestingly, I only hit this issue when using helgrind, not when using “normal” valgrind…

      It looks like libgpg-error.so.0 has a .init section which does something bad (e.g. relies on some other .so’s .init running before it, perhaps?) but I’m not really sure how to investigate this further..

      1. Chris Adams

        Actually it seems like the issue is due to __libc_rwlock_unlock(), I was able to continue execution in gdb via jump but hit a few other cases in e.g. QtCore etc.

        Scarily enough, helgrind reported a bunch of lock order violations in QtDBus, like:

        ==16954== Observed (incorrect) order is: acquisition of lock at 0x7817BBC
        ==16954==    at 0x4846080: QMutex_lock_WRK (hg_intercepts.c:3043)
        ==16954==    by 0x4FF7841: ??? (in /usr/lib/libQt5DBus.so.5.6.3)
        

        When I finally managed to trigger the UI via a secrets-tool request, I hit another issue:

        Program received signal SIGSEGV, Segmentation fault.
        0xed7b822c in ?? ()
        (gdb) bt
        #0  0xed7b822c in ?? ()
        #1  0x0588dd2c in __gconv_find_transform (toset=toset@entry=0xbdde0ae8 "UTF-8//", fromset=fromset@entry=0xbdde0ad0 "ANSI_X3.4-1968//", 
            handle=handle@entry=0xbdde0a90, nsteps=nsteps@entry=0xbdde0a98, flags=0) at gconv_db.c:726
        #2  0x0588ccd8 in __gconv_open (toset=toset@entry=0xbdde0ae8 "UTF-8//", fromset=fromset@entry=0xbdde0ad0 "ANSI_X3.4-1968//", handle=handle@entry=0xbdde0af8, 
            flags=flags@entry=0) at gconv_open.c:172
        #3  0x0588c9be in iconv_open (tocode=0xbdde0ae8 "UTF-8//", tocode@entry=0x5b26220 "UTF-8", fromcode=0xbdde0ad0 "ANSI_X3.4-1968//", 
            fromcode@entry=0x78cf4b0 "ANSI_X3.4-1968") at iconv_open.c:71
        #4  0x05ab76dc in try_conversion (cd=<synthetic pointer>, from_codeset=0x78cf4b0 "ANSI_X3.4-1968", to_codeset=0x5b26220 "UTF-8") at gconvert.c:175
        #5  g_iconv_open (to_codeset=to_codeset@entry=0x5b26220 "UTF-8", from_codeset=from_codeset@entry=0x78cf4b0 "ANSI_X3.4-1968") at gconvert.c:224
        #6  0x05ab7994 in open_converter (error=0x0, from_codeset=0x78cf4b0 "ANSI_X3.4-1968", to_codeset=0x5b26220 "UTF-8") at gconvert.c:318
        #7  g_convert (str=str@entry=0x786b858 "/root/.config/dconf/user", len=len@entry=-1, to_codeset=to_codeset@entry=0x5b26220 "UTF-8", 
            from_codeset=0x78cf4b0 "ANSI_X3.4-1968", bytes_read=bytes_read@entry=0x0, bytes_written=bytes_written@entry=0x0, error=error@entry=0x0) at gconvert.c:577
        #8  0x05ab8a24 in g_filename_display_name (filename=filename@entry=0x786b858 "/root/.config/dconf/user") at gconvert.c:1964
        #9  0x05ad8afc in g_mapped_file_new (filename=0x786b858 "/root/.config/dconf/user", writable=0, error=0x0) at gmappedfile.c:257
        #10 0x08f6b854 in ?? () from /usr/lib/libdconf.so.1
        #11 0x08f6ad22 in ?? () from /usr/lib/libdconf.so.1
        #12 0x08f6ab4e in ?? () from /usr/lib/libdconf.so.1
        #13 0x08f69746 in ?? () from /usr/lib/libdconf.so.1
        #14 0x08f6a252 in ?? () from /usr/lib/libdconf.so.1
        #15 0x08e3e2d2 in MGConfItemPrivate::MGConfItemPrivate(QString, MGConfItem*) () from /usr/lib/libmlite5.so.0
        #16 0x08e3e7d4 in MGConfItem::MGConfItem(QString const&, QObject*) () from /usr/lib/libmlite5.so.0
        #17 0x08e0a84e in Silica::ThemePrivate::ThemePrivate (this=0x8e29aa4 <(anonymous namespace)::Q_QGS_themePrivateInstance::innerFunction()::holder>, 
            parent=<optimized out>) at silicatheme.cpp:398
        #18 0x08e0c738 in Holder (this=0x8e29aa4 <(anonymous namespace)::Q_QGS_themePrivateInstance::innerFunction()::holder>) at silicatheme.cpp:272
        #19 innerFunction () at silicatheme.cpp:272
        #20 operator() (this=<optimized out>) at /usr/include/qt5/QtCore/qglobalstatic.h:129
        #21 Silica::ThemePrivate::instance () at silicatheme.cpp:279
        #22 0x08e11a1c in Silica::Private::initializeEngine (engine=engine@entry=0xbdde1514, uri=uri@entry=0x78cd178 "Sailfish.Silica") at silicaimports.cpp:43
        #23 0x08d7540c in SailfishSilicaPlugin::initializeEngine (this=<optimized out>, engine=0xbdde1514, uri=0x78cd178 "Sailfish.Silica") at src/plugin.cpp:160
        ---Type <return> to continue, or q <return> to quit---
        #24 0x0521bef0 in QQmlThread::internalCallMethodInThread (this=0x7751b28, message=<optimized out>) at qml/ftw/qqmlthread.cpp:315
        #25 0x051d8c5e in callMethodInThread<QQmlDataBlob*, QQmlDataBlob*, QQmlTypeLoaderThread> (arg=<synthetic pointer>, Member=
            (void (QQmlTypeLoaderThread::*)(QQmlTypeLoaderThread * const, QQmlDataBlob *)) 0x51dc525 <QQmlTypeLoaderThread::loadThread(QQmlDataBlob*)>, 
            this=<optimized out>) at ../../include/QtQml/5.6.3/QtQml/private/../../../../../src/qml/qml/ftw/qqmlthread_p.h:159
        #26 QQmlTypeLoaderThread::load (this=<optimized out>, b=<optimized out>) at qml/qqmltypeloader.cpp:754
        #27 0x051e20ce in load (this=0xbdde0e94, loader=0x7726480, blob=0x78358e8) at qml/qqmltypeloader.cpp:929
        #28 QQmlTypeLoader::doLoad<PlainLoader> (this=this@entry=0x7726480, loader=..., blob=blob@entry=0x78358e8, mode=mode@entry=QQmlTypeLoader::PreferSynchronous)
            at qml/qqmltypeloader.cpp:993
        #29 0x051dc566 in QQmlTypeLoader::load (this=this@entry=0x7726480, blob=blob@entry=0x78358e8, mode=mode@entry=QQmlTypeLoader::PreferSynchronous)
            at qml/qqmltypeloader.cpp:1016
        #30 0x051dc9e0 in QQmlTypeLoader::getType (this=0x7726480, url=..., mode=mode@entry=QQmlTypeLoader::PreferSynchronous) at qml/qqmltypeloader.cpp:1659
        #31 0x051ca9ee in QQmlComponentPrivate::loadUrl (this=this@entry=0x781c0d0, newUrl=..., mode=mode@entry=QQmlComponent::PreferSynchronous)
            at qml/qqmlcomponent.cpp:675
        #32 0x051cad54 in QQmlComponent::QQmlComponent (this=0xbdde0fb0, engine=0xbdde1514, fileName=..., parent=<optimized out>) at qml/qqmlcomponent.cpp:548
        #33 0x0001f78a in ensureWindow (this=0xbdde15b8) at passwordagent.cpp:277
        #34 PasswordAgent::queueRequest (this=0xbdde15b8, request=request@entry=0x78a0f10) at passwordagent.cpp:207
        #35 0x00021678 in queueRequest (properties=..., message=..., cookie=..., actions=..., this=0xbdde153c) at passwordagentadaptor.cpp:228
        #36 PasswordAgentAdaptor::QueryPassword (this=0xbdde153c, cookie=..., message=..., properties=...) at passwordagentadaptor.cpp:184
        #37 0x0002c5de in PasswordAgentAdaptor::qt_metacall (this=0xbdde153c, _c=QMetaObject::InvokeMetaMethod, _id=2, _a=0xbdde10b8)
            at moc_passwordagentadaptor.cpp:120
        #38 0x04ff37b8 in ?? () from /usr/lib/libQt5DBus.so.5
        #39 0x04ff37b8 in ?? () from /usr/lib/libQt5DBus.so.5
        Backtrace stopped: previous frame identical to this frame (corrupt stack?)
        (gdb) up
        #1  0x0588dd2c in __gconv_find_transform (toset=toset@entry=0xbdde0ae8 "UTF-8//", fromset=fromset@entry=0xbdde0ad0 "ANSI_X3.4-1968//", 
            handle=handle@entry=0xbdde0a90, nsteps=nsteps@entry=0xbdde0a98, flags=0) at gconv_db.c:726
        726   __libc_once (once, __gconv_read_conf);
        (gdb) list
        721   const char *fromset_expand;
        722   const char *toset_expand;
        723   int result;
        724 
        725   /* Ensure that the configuration data is read.  */
        726   __libc_once (once, __gconv_read_conf);
        727 
        728   /* Acquire the lock.  */
        729   __libc_lock_lock (__gconv_lock);
        

        Attempting to jump and continue from here led to sigill and program termination. I’ll keep trying… I wish I knew how to fix the root cause of the issue, though..

        1. Chris Adams

          well, using the drd tool instead of helgrind is proving more useful. Now I just need to suppress about a billion cases of noise…

  7. Damien Caliste

    Hello Chris, thank you for looking at it. Indeed, I am not able to reproduce it with the --get-user-input only. I need to run a gpg2 -s foo operation, which in turn destroy and recreate a window quickly.

    Your Valgrind log is interesting but I cannot link it to any of my findings from wayland_window.cpp in hybris. Next week, I’m going to propose patches in hybris for race conditions on unlocked members. But even with these patches, something nasty happen later in some other parts in QtWayland this time.

    I’m going to try your way also with Valgrind, but I need to make space on device first, I had to remove Valgrind before last upgrade for lack of space on /

    1. Chris Adams

      symlinking some big directories so that they actually get stored on /home partition really helps, especially /usr/lib/debug, /var/cache/pk-zypp-cache, and /var/cache/zypp

      [root@Sailfish ~]# ls -al /usr/lib | grep home
      lrwxrwxrwx  1 root root       19 Jan  8 06:05 debug -> /home/.usrlibdebug/
      [root@Sailfish ~]# ls -al /var/cache | grep home
      lrwxrwxrwx  1 root root   33 Jan  9 11:18 pk-zypp-cache -> /home/.pk-zypp-dist-upgrade-cache
      lrwxrwxrwx  1 root root   11 Jan  8 03:42 zypp -> /home/zypp/
      [root@Sailfish ~]# ls -al /root | grep home
      lrwxrwxrwx  1 root root    29 Aug 14 14:09 sailfishos-rpm-cache -> /home/.rootsailfishosrpmcache
      
  8. Damien Caliste

    Hello Chris, for when you come back from holidays.

    I guess I understand the process of the crash, it’s quite simple:

    • the render thread is peacefully in a makeCurrent() + swapBuffers() procedure;
    • the main thread is destroying the window object during this process is performing.

    What is quite shocking is that I cannot figure out why there is no locking mechanism in QtQuick to prevent this? I mean for all normal cases. There is no debuginfo packages for Qt5Gui and Qt5Quick and I’m not really eager to recompile all Qt and change Qt libs on my device to better see what’s going on. Here is the backtrace:

    #0  0xb3ec32c2 in QWaylandWindow::doResize() () from /usr/lib/libQt5WaylandClient.so.5
    #1  0xb3ec3404 in QWaylandWindow::setCanResize(bool) () from /usr/lib/libQt5WaylandClient.so.5
    #2  0xb1b2291e in QWaylandGLContext::swapBuffers (this=0x7d43d0, surface=<optimized out>) at ../../../../hardwareintegration/client/wayland-egl/qwaylandglcontext.cpp:517
    #3  0xb6969d12 in QOpenGLContext::swapBuffers(QSurface*) () from /usr/lib/libQt5Gui.so.5
    #4  0xb6cf277e in ?? () from /usr/lib/libQt5Quick.so.5
    

    The swapBuffers() here comes from QtWayland code, but frame 3 and 4 comes from Qt5Gui and Qt5Quick. This backtrace comes from the render thread. It’s clear from the debug output I’ve added (and not shown here) that the main thread has just called the destroy function before this happens. Valgrind provides the same kind of information, saying that the rendering thread is accessing with read/write operations some memory that have already been free’d.

    I fact, to obtain this trace, you need to patch libhybris first, otherwise the crash will happen there instead of being in doResize(). But the calling procedure will be the same: swapBuffers()is called on a still valid window, but during the time this operation is performing, the main thread is freeing the memory related to this window and boom.

    So I can patch libhybris not to crash or hang if the main thread is destroying the window while doing swapBuffers() call, but I’m a bit dry on what to do in QtWayland…

    1. Chris Adams

      qtquick debuginfo should be available: qt5-qtdeclarative-debuginfo from mer-qt repository, and qtbase/qtgui should be available in qt5-debuginfo package from mer-qt repository also.

      I’ve cc’d denexter and frajo into the libhybris PR for more discussion. I don’t know enough about the rendering pipeline to comment, but I recall that Andrew was looking into buffer operation ordering issues: https://github.com/mer-hybris/qt5-qpa-hwcomposer-plugin/pull/75/files might be somehow related?

  9. Damien Caliste

    With qtdeclarative-debuginfo, the backtrace provides a bit more information for the code path:

    0xb61e7556 in QMutex::lock() () from /usr/lib/libQt5Core.so.5
    (gdb) bt
    #0 0xb61e7556 in QMutex::lock() () from /usr/lib/libQt5Core.so.5
    #1 0xb3ed03c6 in QWaylandWindow::setCanResize(bool) () from /usr/lib/libQt5WaylandClient.so.5
    #2 0xb287291e in QWaylandGLContext::swapBuffers (this=0x72aee0, surface=<optimized out>) at ../../../../hardwareintegration/client/wayland-egl/qwaylandglcontext.cpp:517
    #3 0xb6976d12 in QOpenGLContext::swapBuffers(QSurface*) () from /usr/lib/libQt5Gui.so.5
    #4 0xb6cff77e in QSGRenderThread::syncAndRender (this=this@entry=0x5c59d8) at scenegraph/qsgthreadedrenderloop.cpp:636
    #5 0xb6cfff76 in QSGRenderThread::run (this=0x5c59d8) at scenegraph/qsgthreadedrenderloop.cpp:715
    #6 0xb61ec490 in ?? () from /usr/lib/libQt5Core.so.5
    #7 0xb61ec490 in ?? () from /usr/lib/libQt5Core.so.5
    Backtrace stopped: previous frame identical to this frame (corrupt stack?)
    

    I’ll investigate and report here if I find anything usefull.

  10. Damien Caliste
    Breakpoint 1, QWaylandWindow::~QWaylandWindow (this=0xedd460, __in_chrg=<optimized out>) at qwaylandwindow.cpp:139
    139     delete mWindowDecoration;
    (gdb) bt
    #0  QWaylandWindow::~QWaylandWindow (this=0xedd460, __in_chrg=<optimized out>) at qwaylandwindow.cpp:139
    #1  0xb2552d6a in QWaylandEglWindow::~QWaylandEglWindow (this=0xedd460, __in_chrg=<optimized out>) at ../../../../hardwareintegration/client/wayland-egl/qwaylandeglwindow.cpp:85
    #2  0xb2552d8c in QWaylandEglWindow::~QWaylandEglWindow (this=0xedd460, __in_chrg=<optimized out>) at ../../../../hardwareintegration/client/wayland-egl/qwaylandeglwindow.cpp:85
    #3  0xb693e67a in QWindow::destroy() () from /usr/lib/libQt5Gui.so.5
    #4  0xb693f64a in QWindow::event(QEvent*) () from /usr/lib/libQt5Gui.so.5
    #5  0xb6d0e168 in QQuickWindow::event (this=0xed3a10, e=0xbec4383c) at items/qquickwindow.cpp:1502
    #6  0xb630222a in QCoreApplication::notify(QObject*, QEvent*) () from /usr/lib/libQt5Core.so.5
    #7  0xb63021ae in QCoreApplication::notifyInternal2(QObject*, QEvent*) () from /usr/lib/libQt5Core.so.5
    #8  0xb6935866 in QGuiApplicationPrivate::processCloseEvent(QWindowSystemInterfacePrivate::CloseEvent*) () from /usr/lib/libQt5Gui.so.5
    #9  0xb69382dc in QGuiApplicationPrivate::processWindowSystemEvent(QWindowSystemInterfacePrivate::WindowSystemEvent*) () from /usr/lib/libQt5Gui.so.5
    #10 0xb69221ec in QWindowSystemInterface::sendWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Gui.so.5
    #11 0xb6922dc2 in QWindowSystemInterface::flushWindowSystemEvents(QFlags<QEventLoop::ProcessEventsFlag>) () from /usr/lib/libQt5Gui.so.5
    #12 0xb693c6c2 in QWindow::close() () from /usr/lib/libQt5Gui.so.5
    #13 0xb693ef38 in ?? () from /usr/lib/libQt5Gui.so.5
    #14 0xb693ef38 in ?? () from /usr/lib/libQt5Gui.so.5
    

    New interesting backtrace, gathered via a break point i the main thread, when the QWaylandWindow is destroyed (and thus not usable anymore from the rendering thread).

    At the same time, when we are entering this function, the rendering thread is still alive and updating the view… I’m currently looking at why QQuickWindow::event from frame 5 is not blocking on the rendering thread and finishing it. I’m testing some patch to allow this and put the backtrace here for reference.

  11. Damien Caliste

    Another interesting discovery is the events stack of the QQuickWindow at hand:

    [D] unknown:0 - QEvent::Type(Close) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(PlatformSurface) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(FocusAboutToChange) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(FocusOut) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(PlatformSurface) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(WindowStateChange) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(Resize) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(Expose) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - exposureChanged() SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - handleObscurity() SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - QEvent::Type(Hide) SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - hide() SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 - releaseResources() in api-call SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 -  - waiting for render thread to exit SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    [D] unknown:0 -  - render thread finished SecurityCodeDialog_QMLTYPE_57(0x1878860, name="ApplicationWindow")
    

    We can see that the close event (that trigger the destruction of the underlying wayland window, see https://code.woboq.org/qt5/qtbase/src/gui/kernel/qwindow.cpp.html#1920) is followed by a lot of events, and only the hide() one will finally stop the rendering thread.

  12. Damien Caliste

    Some additional thoughts and facts:

    • calling QWindow::close()will delete the underlying platform window, without notifying the QSGRenderLoop of anything, if the threaded render loop is used, the thread is left with deleted resources;
    • calling QWindow::hide() will actually release the resources and lock for the thread to finish, this scheme is well spred as it is also used in silica/components/src/declarativewindow.cpp:554;
    • lipstick-security-ui code seems to call close() first and then hide(), but not sure, don’t have access to code.
  13. Damien Caliste

    Some partial solutions:

    • call hide() in the QEvent::Close case in QQuickWindow::event()this will effectively lock the rendering thread and release resources before the QWindow::event(Close) parent event will actually call QWindow::destroy(). There is even a comment in qsgthreadedrenderloop.cpp:850 about the hide() method saying that it should be called on close.
    • since close event will also call the hide routines, there should be a fix in silica/components/src/declarativewindow.cpp because it is listening to hide event only to add a listener on openglContext destruction, while calling close() will now delete this context sooner than hide. The fix is trivial, it should add the listener on close event also.

    These two fixes seem to solve the problem. I still need to try and retry again, since it was a race condition, to see if it appears again or not on longer periods.

    There are still race conditions in my opinion in the qsgthreadedrenderloop code, not crashing ones but some that may eat batery. One that I can trigger some time or not, and visible with the export QT_LOGGING_RULES="*.debug=true" is the fact that when the window is closed, some animation was still running (like the button highlight animation I guess) and the startOrStopAnimationTimer is then creating a timer to keep the animation running in background for the case the window will reappear again. Of course, in our case, nothing will reappear and the timer is just waking up the device for nothing. I need to further investigate (notice that this behaviour was already observed some times before the above fixes, just that I didn’t pay attention at that time and crash was much more frequent).

  14. Damien Caliste

    I don’t have write access to silica, here is the patch:

    commit 5b77a82ccf7fab6ab38b19420b73a8e0ac85c7f4
    Author: Damien Caliste <dcaliste@free.fr>
    Date:   Fri Feb 1 15:41:54 2019 +0100
    
        [sailfish-silica] Listen to GlContext destruction in case of window closing.
    
    diff --git a/components/src/declarativewindow.cpp b/components/src/declarativewindow.cpp
    index e166684..91ba86a 100644
    --- a/components/src/declarativewindow.cpp
    +++ b/components/src/declarativewindow.cpp
    @@ -541,7 +541,8 @@ bool DeclarativeWindow::eventFilter(QObject *object, QEvent *event)
         if (event->type() == QEvent::PlatformSurface
                     && static_cast<QPlatformSurfaceEvent *>(event)->surfaceEventType() == QPlatformSurfaceEvent::SurfaceCreated) {
             updateWindowProperties();
    -    } else if (event->type() == QEvent::Hide) {
    +    } else if (event->type() == QEvent::Hide
    +               || event->type() == QEvent::Close) {
             if (QOpenGLContext *context = object == m_window
                         && !m_persistentSceneGraph
                         && !m_persistentOpenGLContext
    

    It is mandatory if https://git.merproject.org/mer-core/qtdeclarative/merge_requests/25 is judged valuable.

    1. Andrew den Exter

      The next else statement in that block handles close events and this will prevent that from being evaluated. Simply removing the else should be enough, it only exists to prevent evaluating previously impossible conditions.

      1. Damien Caliste

        Thank you Andrew for looking at this. You’re right, the previous patch was missing the else part in the if. But removing the else as is is not enough since the else part was containing some QEvent::Show filtering that is not present in the else if (QEvent::Close and QEvent::Hide). So I need to move the topLevelWindow aliasing and make a specific els if for the show case. Here is the updated patch:

        commit 9211e2211ca2a3b08b74aafa61566441ca9f2e8f
        Author: Damien Caliste <dcaliste@free.fr>
        Date:   Fri Feb 1 15:41:54 2019 +0100
        
            [sailfish-silica] Listen to GlContext destruction in case of window closing.
        
        diff --git a/components/src/declarativewindow.cpp b/components/src/declarativewindow.cpp
        index e166684..d2699fc 100644
        --- a/components/src/declarativewindow.cpp
        +++ b/components/src/declarativewindow.cpp
        @@ -538,10 +538,12 @@ void DeclarativeWindow::screenGeometryChanged(const QRect &geometry)
        
         bool DeclarativeWindow::eventFilter(QObject *object, QEvent *event)
         {
        +    QWindow * const topLevelWindow = m_mainWindow ? m_mainWindow : m_window;
             if (event->type() == QEvent::PlatformSurface
                         && static_cast<QPlatformSurfaceEvent *>(event)->surfaceEventType() == QPlatformSurfaceEvent::SurfaceCreated) {
                 updateWindowProperties();
        -    } else if (event->type() == QEvent::Hide) {
        +    } else if (event->type() == QEvent::Hide
        +               || event->type() == QEvent::Close) {
                 if (QOpenGLContext *context = object == m_window
                             && !m_persistentSceneGraph
                             && !m_persistentOpenGLContext
        @@ -553,8 +555,6 @@ bool DeclarativeWindow::eventFilter(QObject *object, QEvent *event)
                     // to release the buffers for that window.
                     connect(context, &QOpenGLContext::destroyed, this, &DeclarativeWindow::glContextDestroyed, Qt::DirectConnection);
                 }
        -    } else {
        -        QWindow * const topLevelWindow = m_mainWindow ? m_mainWindow : m_window;
                 if (event->type() == QEvent::Close) {
                     if (object == topLevelWindow) {
                         m_windowVisible = false;
        @@ -564,11 +564,11 @@ bool DeclarativeWindow::eventFilter(QObject *object, QEvent *event)
                     } else if (object == m_cover) {
                         topLevelWindow->close();
                     }
        -        } else if (event->type() == QEvent::Show && object == topLevelWindow) {
        -            m_windowVisible = true;
        -            if (!m_explicitCoverVisible) {
        -                setCoverVisible(true, false);
        -            }
        +        }
        +    } else if (event->type() == QEvent::Show && object == topLevelWindow) {
        +        m_windowVisible = true;
        +        if (!m_explicitCoverVisible) {
        +            setCoverVisible(true, false);
                 }
             }
             return false;
        

        What do you think ?