persona_example hangs on PPC/SPARC

Issue #90 resolved
Dan Bonachea created an issue

The persona_example test is experiencing intermittent hangs on PPC + SPARC:

http://upc-bugs.lbl.gov/upc_tests/detail.php?testname=external-upcxx%2Fpersona-example-par&date=2017-10-06&branch=ALL&config=&date_shift=-1+Day

The nightly test history shows only these two platforms appear to demonstrate the problem:

http://upc-bugs.lbl.gov/upc_tests/history.php?date=newest&start_date=2017-09-18&branch=ALL&cr=run&testname=external-upcxx%2Fpersona-example-par

It's plausible this is due to an OpenMP issue or other system misconfiguration/load problem.

However the fact both of these platforms are notably exactly those using a big endian ABI, it's also possible our runtime has a endianness assumption bug.

Comments (5)

  1. Paul Hargrove

    [...] system misconfiguration/load problem.

    We do not control these systems and they are unscheduled/free-for-all systems. So, load could be a factor. I am planning to test manually (when I see low load) to rule this out.

    However the fact both of these platforms are notably exactly those using a big endian ABI, it's also possible our runtime has a endianness assumption bug.

    We have little-endian PPC builds which do NOT display this hang.
    This is consistent with the hypothesis that endianness is related, but not conclusive since the compiler versions (and this OpenMP libs) are not identical.

  2. Paul Hargrove

    PPC64 system has 64 cores. I observe load less than 0.6.

    I run as follows:

    env OMP_NUM_THREADS=2  GASNET_PSHM_NODES=2 ./persona-example
    

    and it hangs some fraction (not measured) of my attempts.

    Here is a transcript with three backtraces:

    Script started on Wed 11 Oct 2017 09:40:39 PM UTC
    phargrov@gcc1-power7:~/upcxx/example/prog-guide]$ env GASNET_BACKTRACE_SIGNAL=int GASNET_PSHM_NODES=2 OMP_NUM_THREADS=2 ./persona-example
    ^CCaught GASNET_BACKTRACE_SIGNAL: signal SIGINT(2)
    Caught GASNET_BACKTRACE_SIGNAL: signal SIGINT(2)
    [0] Invoking GDB for backtrace...
    [1] Invoking GDB for backtrace...
    [0] /opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_oaHPB5 '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20475
    [0] [New LWP 20477]
    [0] To enable execution of this file add
    [0]     add-auto-load-safe-path /opt/at11.0/lib64/libthread_db-1.0.so
    [0] line to your configuration file "/home/phargrov/.gdbinit".
    [0] To completely disable this security protection add
    [0]     set auto-load safe-path /
    [0] line to your configuration file "/home/phargrov/.gdbinit".
    [0] For more information about this security protection see the
    [0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [0]     info "(gdb)Auto-loading safe path"
    [0] 0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0]   Id   Target Id         Frame
    [0] * 1    LWP 20475 "persona-example" 0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0]   2    LWP 20477 "persona-example" 0x000000001000600c in std::find_if<std::_List_iterator<upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>, long int> >, main(int, char**)::<lambda(upcxx::future<long int>&)> >(std::_List_iterator<upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>, long> >, std::_List_iterator<upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>, long> >, <lambda(upcxx::future<long int>&)>) (__first=..., __last=..., __pred=...) at /opt/at11.0/include/c++/7.2.1/bits/stl_algo.h:3923
    [0]
    [0] Thread 2 (LWP 20477):
    [0] #0  0x000000001000600c in std::find_if<std::_List_iterator<upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>, long int> >, main(int, char**)::<lambda(upcxx::future<long int>&)> >(std::_List_iterator<upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>, long> >, std::_List_iterator<upcxx::future1<upcxx::detail::future_kind_shref<upcxx::detail::future_header_ops_general>, long> >, <lambda(upcxx::future<long int>&)>) (__first=..., __last=..., __pred=...) at /opt/at11.0/include/c++/7.2.1/bits/stl_algo.h:3923
    [0] #1  0x000000001000e010 in .main._omp_fn () at persona-example.cpp:79
    [0]
    [0] Thread 1 (LWP 20475):
    [0] #0  0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #1  0x00003fff9911ce90 in ?? () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #2  0x000000001007f61c in gasneti_system_redirected (cmd=0x10319090 <cmd> "/opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_oaHPB5 '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20475", stdout_fd=3) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:971
    [0] #3  0x000000001008017c in gasneti_bt_gdb (fd=3) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1218
    [0] #4  0x0000000010080d58 in gasneti_print_backtrace (fd=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1487
    [0] #5  0x000000001007f0e4 in gasneti_ondemandHandler (sig=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:847
    [0] #6  <signal handler called>
    [0] #7  0x00003fff99346ef4 in ?? () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #8  0x00003fff99344ae8 in ?? () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #9  0x0000000000000001 in ?? ()
    [0] #10 0x00003fff99338fe8 in .GOMP_parallel_end () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #11 0x00003fff99339b84 in .GOMP_parallel_sections () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #12 0x0000000010005ba4 in main (argc=1, argv=0x3fffee4857d8) at persona-example.cpp:61
    [1] /opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_WDJU3d '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20476
    [1] To enable execution of this file add
    [1]     add-auto-load-safe-path /opt/at11.0/lib64/libthread_db-1.0.so
    [1] line to your configuration file "/home/phargrov/.gdbinit".
    [1] To completely disable this security protection add
    [1]     set auto-load safe-path /
    [1] line to your configuration file "/home/phargrov/.gdbinit".
    [1] For more information about this security protection see the
    [1] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [1]     info "(gdb)Auto-loading safe path"
    [1] 0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1]   Id   Target Id         Frame
    [1] * 1    process 20476 "persona-example" 0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1]
    [1] Thread 1 (process 20476):
    [1] #0  0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #1  0x00003fff9911ce90 in ?? () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #2  0x000000001007f61c in gasneti_system_redirected (cmd=0x10319090 <cmd> "/opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_WDJU3d '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20476", stdout_fd=4) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:971
    [1] #3  0x000000001008017c in gasneti_bt_gdb (fd=4) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1218
    [1] #4  0x0000000010080d58 in gasneti_print_backtrace (fd=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1487
    [1] #5  0x000000001007f0e4 in gasneti_ondemandHandler (sig=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:847
    [1] #6  <signal handler called>
    [1] #7  0x00003fff991e5078 in .__sched_yield () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #8  0x00000000100258e0 in upcxx::progress (level=upcxx::progress_level::user) at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:437
    [1] #9  0x0000000010025098 in upcxx::barrier () at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:191
    [1] #10 0x0000000010024eac in upcxx::finalize () at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:170
    [1] #11 0x0000000010005ca0 in main (argc=1, argv=0x3fffee4857d8) at persona-example.cpp:98
    
    
    ^CCaught GASNET_BACKTRACE_SIGNAL: signal SIGINT(2)
    Caught GASNET_BACKTRACE_SIGNAL: signal SIGINT(2)
    [0] Invoking GDB for backtrace...
    [1] Invoking GDB for backtrace...
    [1] /opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_n2ezF1 '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20476
    [1] To enable execution of this file add
    [1]     add-auto-load-safe-path /opt/at11.0/lib64/libthread_db-1.0.so
    [1] line to your configuration file "/home/phargrov/.gdbinit".
    [1] To completely disable this security protection add
    [1]     set auto-load safe-path /
    [1] line to your configuration file "/home/phargrov/.gdbinit".
    [1] For more information about this security protection see the
    [1] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [1]     info "(gdb)Auto-loading safe path"
    [1] 0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1]   Id   Target Id         Frame
    [1] * 1    process 20476 "persona-example" 0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1]
    [1] Thread 1 (process 20476):
    [1] #0  0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #1  0x00003fff9911ce90 in ?? () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #2  0x000000001007f61c in gasneti_system_redirected (cmd=0x10319090 <cmd> "/opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_n2ezF1 '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20476", stdout_fd=4) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:971
    [1] #3  0x000000001008017c in gasneti_bt_gdb (fd=4) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1218
    [1] #4  0x0000000010080d58 in gasneti_print_backtrace (fd=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1487
    [1] #5  0x000000001007f0e4 in gasneti_ondemandHandler (sig=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:847
    [1] #6  <signal handler called>
    [1] #7  0x00003fff991e5078 in .__sched_yield () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #8  0x00000000100258e0 in upcxx::progress (level=upcxx::progress_level::user) at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:437
    [1] #9  0x0000000010025098 in upcxx::barrier () at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:191
    [1] #10 0x0000000010024eac in upcxx::finalize () at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:170
    [1] #11 0x0000000010005ca0 in main (argc=1, argv=0x3fffee4857d8) at persona-example.cpp:98
    [0] /opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_ks5tdT '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20475
    [0] [New LWP 20477]
    [0] To enable execution of this file add
    [0]     add-auto-load-safe-path /opt/at11.0/lib64/libthread_db-1.0.so
    [0] line to your configuration file "/home/phargrov/.gdbinit".
    [0] To completely disable this security protection add
    [0]     set auto-load safe-path /
    [0] line to your configuration file "/home/phargrov/.gdbinit".
    [0] For more information about this security protection see the
    [0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [0]     info "(gdb)Auto-loading safe path"
    [0] 0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0]   Id   Target Id         Frame
    [0] * 1    LWP 20475 "persona-example" 0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0]   2    LWP 20477 "persona-example" 0x00003fff991e5078 in .__sched_yield () from /opt/at11.0/lib64/power7/libc.so.6
    [0]
    [0] Thread 2 (LWP 20477):
    [0] #0  0x00003fff991e5078 in .__sched_yield () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #1  0x00000000100258e0 in upcxx::progress (level=upcxx::progress_level::user) at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:437
    [0] #2  0x000000001000dfbc in .main._omp_fn () at persona-example.cpp:77
    [0]
    [0] Thread 1 (LWP 20475):
    [0] #0  0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #1  0x00003fff9911ce90 in ?? () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #2  0x000000001007f61c in gasneti_system_redirected (cmd=0x10319090 <cmd> "/opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_ks5tdT '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20475", stdout_fd=3) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:971
    [0] #3  0x000000001008017c in gasneti_bt_gdb (fd=3) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1218
    [0] #4  0x0000000010080d58 in gasneti_print_backtrace (fd=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1487
    [0] #5  0x000000001007f0e4 in gasneti_ondemandHandler (sig=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:847
    [0] #6  <signal handler called>
    [0] #7  0x00003fff99346ef4 in ?? () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #8  0x00003fff99344ae8 in ?? () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #9  0x0000000000000001 in ?? ()
    [0] #10 0x00003fff99338fe8 in .GOMP_parallel_end () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #11 0x00003fff99339b84 in .GOMP_parallel_sections () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #12 0x0000000010005ba4 in main (argc=1, argv=0x3fffee4857d8) at persona-example.cpp:61
    
    
    ^CCaught GASNET_BACKTRACE_SIGNAL: signal SIGINT(2)
    Caught GASNET_BACKTRACE_SIGNAL: signal SIGINT(2)
    [1] Invoking GDB for backtrace...
    [0] Invoking GDB for backtrace...
    [0] /opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_UOfJfb '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20475
    [0] [New LWP 20477]
    [0] To enable execution of this file add
    [0]     add-auto-load-safe-path /opt/at11.0/lib64/libthread_db-1.0.so
    [0] line to your configuration file "/home/phargrov/.gdbinit".
    [0] To completely disable this security protection add
    [0]     set auto-load safe-path /
    [0] line to your configuration file "/home/phargrov/.gdbinit".
    [0] For more information about this security protection see the
    [0] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [0]     info "(gdb)Auto-loading safe path"
    [0] 0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0]   Id   Target Id         Frame
    [0] * 1    LWP 20475 "persona-example" 0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0]   2    LWP 20477 "persona-example" 0x00003fff991e5078 in .__sched_yield () from /opt/at11.0/lib64/power7/libc.so.6
    [0]
    [0] Thread 2 (LWP 20477):
    [0] #0  0x00003fff991e5078 in .__sched_yield () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #1  0x00000000100258e0 in upcxx::progress (level=upcxx::progress_level::user) at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:437
    [0] #2  0x000000001000dfbc in .main._omp_fn () at persona-example.cpp:77
    [0]
    [0] Thread 1 (LWP 20475):
    [0] #0  0x00003fff991beb20 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #1  0x00003fff9911ce90 in ?? () from /opt/at11.0/lib64/power7/libc.so.6
    [0] #2  0x000000001007f61c in gasneti_system_redirected (cmd=0x10319090 <cmd> "/opt/at11.0/bin/gdb -nx -batch
    -x /tmp/gasnet_UOfJfb '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20475", stdout_fd=3) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:971
    [0] #3  0x000000001008017c in gasneti_bt_gdb (fd=3) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1218
    [0] #4  0x0000000010080d58 in gasneti_print_backtrace (fd=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1487
    [0] #5  0x000000001007f0e4 in gasneti_ondemandHandler (sig=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:847
    [0] #6  <signal handler called>
    [0] #7  0x00003fff99346ef4 in ?? () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #8  0x00003fff99344ae8 in ?? () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #9  0x0000000000000001 in ?? ()
    [0] #10 0x00003fff99338fe8 in .GOMP_parallel_end () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #11 0x00003fff99339b84 in .GOMP_parallel_sections () from /opt/at11.0/lib64/power7/libgomp.so.1
    [0] #12 0x0000000010005ba4 in main (argc=1, argv=0x3fffee4857d8) at persona-example.cpp:61
    [1] /opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_RWrOHj '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20476
    [1] To enable execution of this file add
    [1]     add-auto-load-safe-path /opt/at11.0/lib64/libthread_db-1.0.so
    [1] line to your configuration file "/home/phargrov/.gdbinit".
    [1] To completely disable this security protection add
    [1]     set auto-load safe-path /
    [1] line to your configuration file "/home/phargrov/.gdbinit".
    [1] For more information about this security protection see the
    [1] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [1]     info "(gdb)Auto-loading safe path"
    [1] 0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1]   Id   Target Id         Frame
    [1] * 1    process 20476 "persona-example" 0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1]
    [1] Thread 1 (process 20476):
    [1] #0  0x00003fff991bea78 in .__waitpid () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #1  0x00003fff9911ce90 in ?? () from /opt/at11.0/lib64/power7/libc.so.6
    [1] #2  0x000000001007f61c in gasneti_system_redirected (cmd=0x10319090 <cmd> "/opt/at11.0/bin/gdb -nx -batch -x /tmp/gasnet_RWrOHj '/home/phargrov/upcxx/example/prog-guide/./persona-example' 20476", stdout_fd=4) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:971
    [1] #3  0x000000001008017c in gasneti_bt_gdb (fd=4) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1218
    [1] #4  0x0000000010080d58 in gasneti_print_backtrace (fd=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:1487
    [1] #5  0x000000001007f0e4 in gasneti_ondemandHandler (sig=2) at /home/phargrov/upcxx/.nobs/art/7cfd9bb63b8ab05dc711179a18ebc5552610d6ec/GASNet-EX-collaborator-snapshot/gasnet_tools.c:847
    [1] #6  <signal handler called>
    [1] #7  0x000000001003c9dc in upcxx::backend::gasnet::rpc_inbox::burst (this=0x3fff99817188, burst_n=1) at /home/phargrov/upcxx/src/backend/gasnet/rpc_inbox.cpp:8
    [1] #8  0x000000001002561c in upcxx::<lambda()>::operator()(void) const (__closure=0x3fffee485068) at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:404
    [1] #9  0x0000000010026220 in upcxx::detail::persona_as_top<upcxx::progress(upcxx::progress_level)::<lambda()> >(upcxx::persona &, upcxx::<lambda()> &&) (p=..., fn=...) at /home/phargrov/upcxx/.nobs/art/bbec865dd5c381d3e9f0be22df752ca2c6f93884/upcxx/persona.hpp:321
    [1] #10 0x0000000010025830 in upcxx::progress (level=upcxx::progress_level::user) at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:403
    [1] #11 0x0000000010025098 in upcxx::barrier () at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:191
    [1] #12 0x0000000010024eac in upcxx::finalize () at /home/phargrov/upcxx/src/backend/gasnet/runtime.cpp:170
    [1] #13 0x0000000010005ca0 in main (argc=1, argv=0x3fffee4857d8) at persona-example.cpp:98
    
  3. Paul Hargrove

    With the changes @jbachan made yesterday (426c04f and b6a2f40), the problem appears to be resolved.
    However, given the probabilistic nature of the failure, I'd like to wait for a few more nights runs to be certain the problem is fixed rather than just less frequent.

  4. Log in to comment