extend-add: should error-check filesystem calls

Issue #252 resolved
Dan Bonachea created an issue

Currently (as of 666a014) if you run extend-add with no arguments or where the first argument is a non-existent file, you get an ugly crash:

{cori[1]} srun -n 1 extend-add_upcxx-seq
nprow is 1
npcol is 1
timer frontal_matrix_creation maximum value: 0.000358297 s
*** Caught a fatal signal (proc 0): SIGSEGV(11)
[0] Invoking GDB for backtrace...
[0] /usr/bin/gdb -nx -batch -x /tmp/gasnet_ZEjIyJ '/global/u1/b/bonachea/UPC/bupcr-icc-hsw/dbg/gasnet/tests/upcr-harness/external-upcxx/./extend-add_upcxx-seq' 6597
[0] [Thread debugging using libthread_db enabled]
[0] Using host libthread_db library "/lib64/libthread_db.so.1".
[0] 0x000000002033fa3a in __waitpid (pid=6600, stat_loc=stat_loc@entry=0x7fffffff2948, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
[0] #0  0x000000002033fa3a in __waitpid (pid=6600, stat_loc=stat_loc@entry=0x7fffffff2948, options=options@entry=0) at ../sysdeps/unix/sysv/linux/waitpid.c:29
[0] #1  0x00000000205d6d6f in do_system (line=<optimized out>) at ../sysdeps/posix/system.c:148
[0] #2  0x00000000200af114 in gasneti_system_redirected (cmd=0x40851e20 <cmd> "/usr/bin/gdb -nx -batch -x /tmp/gasnet_ZEjIyJ '/global/u1/b/bonachea/UPC/bupcr-icc-hsw/dbg/gasnet/tests/upcr-harness/external-upcxx/./extend-add_upcxx-seq' 6597", stdout_fd=10) at /global/u1/b/bonachea/UPC/upcr/gasnet/gasnet_tools.c:1275
[0] #3  0x00000000200af7f2 in gasneti_bt_gdb (fd=10) at /global/u1/b/bonachea/UPC/upcr/gasnet/gasnet_tools.c:1531
[0] #4  0x00000000200b0541 in gasneti_print_backtrace (fd=2) at /global/u1/b/bonachea/UPC/upcr/gasnet/gasnet_tools.c:1806
[0] #5  0x00000000200b0d9a in _gasneti_print_backtrace_ifenabled (fd=2) at /global/u1/b/bonachea/UPC/upcr/gasnet/gasnet_tools.c:1938
[0] #6  0x000000002024a650 in gasneti_defaultSignalHandler (sig=11) at /global/u1/b/bonachea/UPC/upcr/gasnet/gasnet_internal.c:704
[0] #7  <signal handler called>
[0] #8  0x0000000020007681 in main (argc=1, argv=0x7fffffff63b8) at src/main.cpp:243
[0] [Inferior 1 (process 6597) detached]
srun: error: nid00188: task 0: Segmentation fault
srun: Terminating job step 23946682.4

The code at src/main.cpp:48 currently blindly assumes the first argument corresponds to a valid input file. We should add at least some minimal error checking that the user provided an argument naming a file that can successfully be opened, otherwise provide an explanatory error message.

Ideally the input file parsing logic would also detect early EOF or other forms of invalid/truncated input file and similarly issue an error message instead of a crash.

Similarly, it appears that src/main.cpp:40 is unconditionally opening logfiles in the $CWD and not checking for success, which could easily fail if the directory or filesystem is read-only.

We want this code to be "exemplary", so it should check for plausible errors when interacting with the file system.

Comments (4)

  1. Dan Bonachea reporter

    On a closely related topic, a common error mode is to pass the wrong input file, which also results in an ugly crash and no indication of what the user did wrong - for example:

    {pcp-d-5} upcxx-run -shared-heap=1GB -np 2 bin/extend-add_upcxx /home/data2/upcnightly/extend-add/audikw_1/audikw_1_1.dmp
    nprow is 2
    npcol is 1
    *** FATAL ERROR (proc 0): Assertion failure in gasneti_TM_Split() at 2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tm.c:102: addr >= ep->_segment->_addr
       op1 : 0x0000000000000000 == addr
       op2 : 0x00007f03349c9000 == ep->_segment->_addr
    [0] Invoking GDB for backtrace...
    *** FATAL ERROR (proc 1): Assertion failure in gasneti_TM_Split() at 2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tm.c:102: addr >= ep->_segment->_addr
       op1 : 0x0000000000000000 == addr
       op2 : 0x00007f592941d000 == ep->_segment->_addr
    [1] Invoking GDB for backtrace...
    [1] /usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_8IDLbb '/home/pcp1/bonachea/UPC/upcxx-extras/examples/extend-add/bin/extend-add_upcxx' 23365
    [1] [Thread debugging using libthread_db enabled]
    [1] Using host libthread_db library "/lib64/libthread_db.so.1".
    [1] 0x00007f596c4d4a3c in waitpid () from /lib64/libc.so.6
    [1] To enable execution of this file add
    [1]     add-auto-load-safe-path /usr/local/pkg/gcc/9.1.0/lib64/libstdc++.so.6.0.26-gdb.py
    [1] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
    [1] To completely disable this security protection add
    [1]     set auto-load safe-path /
    [1] line to your configuration file "/home/pcp1/bonachea/.gdbinit".
    [1] For more information about this security protection see the
    [1] "Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    [1]     info "(gdb)Auto-loading safe path"
    [1] #0  0x00007f596c4d4a3c in waitpid () from /lib64/libc.so.6
    [1] #1  0x00007f596c452de2 in do_system () from /lib64/libc.so.6
    [1] #2  0x000000000052d7df in gasneti_system_redirected (cmd=0xb37640 <cmd> "/usr/local/pkg/gdb/newest/bin/gdb -nx -batch -x /tmp/gasnet_8IDLbb '/home/pcp1/bonachea/UPC/upcxx-extras/examples/extend-add/bin/extend-add_upcxx' 23365", stdout_fd=8) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1271
    [1] #3  0x000000000052e1e0 in gasneti_bt_gdb (fd=8) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1518
    [1] #4  0x000000000052ea20 in gasneti_print_backtrace (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1793
    [1] #5  0x000000000052f002 in _gasneti_print_backtrace_ifenabled (fd=2) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:1925
    [1] #6  0x000000000052c8ae in gasneti_error_abort () at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:739
    [1] #7  0x000000000052cc37 in _gasneti_assert_fail (funcname=0x89f7a0 <__func__.11121> "gasneti_TM_Split", filename=0x89e2a4 "2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tm.c", linenum=102, fmt=0x89b500 "%s %s %s\n   op1 : 0x%0*lx == %s\n   op2 : 0x%0*lx == %s\n") at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tools.c:804
    [1] #8  0x000000000079aca7 in gasneti_TM_Split (new_tm_p=0x7ffd13c98708, e_parent=0xfffffffffd0ce6bf, color=1, key=1, addr=0x0, len=2097152, flags=0) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/.nobs/art/d1310ccf9120fa6830b666156636faf8440b8cc4/GASNet-2019.3.2/gasnet_tm.c:102
    [1] #9  0x000000000044f9c2 in upcxx::team::split (this=0x7ffd13c98830, color=1, key=1) at /tmp/upcxx-nightly-dirac-gcc/bld/upcxx_install/upcxx-2019.3.2/src/team.cpp:72
    [1] #10 0x000000000040a43e in strumpack::FrontalMatrixMPI<double, int>::FrontalMatrixMPI (this=0x2f31990, _sep=117451, _sep_begin=919254, _sep_end=919257, _dim_upd=114, _upd=0x257bee0, _front_team=..., aactive=true, _total_procs=2) at ./include/FrontalMatrixMPI.hpp:180
    [1] #11 0x00000000004079ba in main (argc=2, argv=0x7ffd13c98fc8) at src/main.cpp:200
    
  2. Log in to comment