intermittent nobs error messages at exit

Issue #4 resolved
Dan Bonachea created an issue

The error output handling in nobs does not correctly support output redirection.

Here's an example of an intentionally-created error with default console output :

$ export GASNET=/noexist ; nobs exe test/dist_object.cpp        
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Uncaught exception

Traceback (most recent call last):
  File "/home/bonachea/UPC/upcxx/nobs/nobs/tool_main.py", line 85, in main
    printed.wait()
  File "/home/bonachea/UPC/upcxx/nobs/nobs/async.py", line 403, in wait
    return me.wait_futurized().value()
  File "/home/bonachea/UPC/upcxx/nobs/nobs/async.py", line 707, in fire
    arg = gen_throw(arg.exception, None, arg.traceback)
  File "/home/bonachea/UPC/upcxx/nobs/nobs/memodb.py", line 458, in memo_execute
    yield result
<trim>
  File "/usr/lib/python2.7/tarfile.py", line 1667, in open
    return func(name, "r", fileobj, **kwargs)
  File "/usr/lib/python2.7/tarfile.py", line 1732, in gzopen
    fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
  File "/usr/lib/python2.7/gzip.py", line 94, in __init__
    fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: '/noexist'

Now the same command piped to a process or file:

$ export GASNET=/noexist ; nobs exe test/dist_object.cpp 2>&1 | head 
<no output>

$ export GASNET=/noexist ; nobs exe test/dist_object.cpp > log 2>&1 ; cat log
<no output>

Note the stderr output from the nobs crash has been completely lost.

The results are the same for develop-nobs (5c84c2c) and develop (c186938).

Marking this as critical because it affects the quality of automated CI test results.

Comments (9)

  1. Dan Bonachea reporter

    Still seeing an intermittent message at exit:
    Exception in thread Thread-6 (most likely raised during interpreter shutdown):
    although the build seems to complete normally. I suspect there's some kind of race involved because the same command will sometimes show the failure and sometimes not.

    cd /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx && source ./sourceme && export CC=/opt/cray/pe/craype/2.5.12.3/bin/cc CXX=/opt/cray/pe/craype/2.5.12.3/bin/CC DBGSYM=1 OPTLEV=0 GASNET="/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include" GASNET_CONDUIT=aries  && nobs exe test/hello_gasnet.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_gasnet.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -O0 -g -Wall -g3 -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_gasnet.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/1cc01d9b85d589859c307a3b49043eca23c19fea.hello_gasnet.cpp.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -g3 -Wall -Wpointer-arith -Wnested-externs -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/0d295a194ba5f1c0c571149bf1c34eec2c39353e.x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/1cc01d9b85d589859c307a3b49043eca23c19fea.hello_gasnet.cpp.o -L/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/lib -lgasnet-aries-seq -L/opt/cray/pe/pmi/5.0.12/lib64 -L/opt/cray/ugni/6.0.14-6.0.4.0_14.1__ge7db4a2.ari/lib64 -lpmi -lugni -L/opt/cray/udreg/2.3.2-6.0.4.0_12.2__g2f9c3ee.ari/lib64 -ludreg -L/opt/cray/xpmem/2.2.2-6.0.4.0_3.1__g43b0535.ari/lib64 -lxpmem -lhugetlbfs -L/opt/gcc/6.3.0/snos/lib/gcc/x86_64-suse-linux/6.3.0 -lgcc -lm
    
    /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/0d295a194ba5f1c0c571149bf1c34eec2c39353e.x
    cd /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx && source ./sourceme && export CC=/opt/cray/pe/craype/2.5.12.3/bin/cc CXX=/opt/cray/pe/craype/2.5.12.3/bin/CC DBGSYM=1 OPTLEV=0 GASNET="/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include" GASNET_CONDUIT=aries  && nobs exe test/hello_upcxx.cpp
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_upcxx.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -O0 -g -Wall -g3 -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_upcxx.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9a6e036434c99dc59bdbb895863c09c0885385c2.hello_upcxx.cpp.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/future/core.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/diagnostic.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/packing.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/backend/gasnet1_seq/backend.cpp
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/future/core.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/99cfc0e5a17d62ceb2546fb420ec9243b6616004.core.cpp.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/diagnostic.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/63188edbe17ecbc912f915010464a5c89f5e56ce.diagnostic.cpp.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/packing.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/837beb9569f416dc9202474db0f7cfb8ed202d3f.packing.cpp.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -O0 -g -Wall -g3 -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/backend/gasnet1_seq/backend.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/d43a01bc6d62360cf8727cac572163d02af6ce31.backend.cpp.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/cc -std=c11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/dl_malloc.c
    
    /opt/cray/pe/craype/2.5.12.3/bin/cc -std=c11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/dl_malloc.c -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9d0e67d6b0c330938bc75e7b3608802263df1067.dl_malloc.c.o
    
    /opt/cray/pe/craype/2.5.12.3/bin/CC -g3 -Wall -Wpointer-arith -Wnested-externs -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/888b34fe563ed09b3d6c55dcb35ba9258c25b03e.x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9d0e67d6b0c330938bc75e7b3608802263df1067.dl_malloc.c.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/63188edbe17ecbc912f915010464a5c89f5e56ce.diagnostic.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9a6e036434c99dc59bdbb895863c09c0885385c2.hello_upcxx.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/99cfc0e5a17d62ceb2546fb420ec9243b6616004.core.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/837beb9569f416dc9202474db0f7cfb8ed202d3f.packing.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/d43a01bc6d62360cf8727cac572163d02af6ce31.backend.cpp.o -L/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/lib -lgasnet-aries-seq -L/opt/cray/pe/pmi/5.0.12/lib64 -L/opt/cray/ugni/6.0.14-6.0.4.0_14.1__ge7db4a2.ari/lib64 -lpmi -lugni -L/opt/cray/udreg/2.3.2-6.0.4.0_12.2__g2f9c3ee.ari/lib64 -ludreg -L/opt/cray/xpmem/2.2.2-6.0.4.0_3.1__g43b0535.ari/lib64 -lxpmem -lhugetlbfs -L/opt/gcc/6.3.0/snos/lib/gcc/x86_64-suse-linux/6.3.0 -lgcc -lm
    
    /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/888b34fe563ed09b3d6c55dcb35ba9258c25b03e.xException in thread Thread-6 (most likely raised during interpreter shutdown):
    

    Mostly harmless, unless the user is relying on the final line of the merged output to be the executable path.

  2. Former user Account Deleted

    This is an unfortunate race condition with using daemon threads in the python interpreter. I'll have to switch to nondaemon threads.

  3. Dan Bonachea reporter

    Just randomly saw this intermittent error at nobs exit, for a build of 8ac58d2 on Linux that otherwise succeeded. Running the same command a few more times did not reproduce the error.

    /home/pcp1/bonachea/UPC/upcxx/.nobs/art/42faf70ac4b59bf76e122d0010f5a7035290e753.xTraceback (most recent call last):
      File "/home/pcp1/bonachea/UPC/upcxx/nobs/tool.py", line 21, in <module>
        shutdown()
      File "/home/pcp1/bonachea/UPC/upcxx/nobs/nobs/async.py", line 322, in shutdown
        pool.join()
      File "/home/pcp1/bonachea/UPC/upcxx/nobs/nobs/async.py", line 238, in join
        for t in me._threads_live:
    RuntimeError: Set changed size during iteration
    
  4. Dan Bonachea reporter

    And another instance of the same intermittent exit failure on sierra

    It appears this failure not only prints an ugly error but also causes nobs to returns a non-zero exit code, which makes this more serious.

  5. Log in to comment