- changed status to resolved
intermittent nobs error messages at exit
The error output handling in nobs does not correctly support output redirection.
Here's an example of an intentionally-created error with default console output :
$ export GASNET=/noexist ; nobs exe test/dist_object.cpp
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Uncaught exception
Traceback (most recent call last):
File "/home/bonachea/UPC/upcxx/nobs/nobs/tool_main.py", line 85, in main
printed.wait()
File "/home/bonachea/UPC/upcxx/nobs/nobs/async.py", line 403, in wait
return me.wait_futurized().value()
File "/home/bonachea/UPC/upcxx/nobs/nobs/async.py", line 707, in fire
arg = gen_throw(arg.exception, None, arg.traceback)
File "/home/bonachea/UPC/upcxx/nobs/nobs/memodb.py", line 458, in memo_execute
yield result
<trim>
File "/usr/lib/python2.7/tarfile.py", line 1667, in open
return func(name, "r", fileobj, **kwargs)
File "/usr/lib/python2.7/tarfile.py", line 1732, in gzopen
fileobj = gzip.GzipFile(name, mode, compresslevel, fileobj)
File "/usr/lib/python2.7/gzip.py", line 94, in __init__
fileobj = self.myfileobj = __builtin__.open(filename, mode or 'rb')
IOError: [Errno 2] No such file or directory: '/noexist'
Now the same command piped to a process or file:
$ export GASNET=/noexist ; nobs exe test/dist_object.cpp 2>&1 | head
<no output>
$ export GASNET=/noexist ; nobs exe test/dist_object.cpp > log 2>&1 ; cat log
<no output>
Note the stderr output from the nobs crash has been completely lost.
The results are the same for develop-nobs (5c84c2c) and develop (c186938).
Marking this as critical because it affects the quality of automated CI test results.
Comments (9)
-
Account Deleted -
reporter Still seeing an intermittent message at exit:
Exception in thread Thread-6 (most likely raised during interpreter shutdown):
although the build seems to complete normally. I suspect there's some kind of race involved because the same command will sometimes show the failure and sometimes not.cd /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx && source ./sourceme && export CC=/opt/cray/pe/craype/2.5.12.3/bin/cc CXX=/opt/cray/pe/craype/2.5.12.3/bin/CC DBGSYM=1 OPTLEV=0 GASNET="/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include" GASNET_CONDUIT=aries && nobs exe test/hello_gasnet.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_gasnet.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -O0 -g -Wall -g3 -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_gasnet.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/1cc01d9b85d589859c307a3b49043eca23c19fea.hello_gasnet.cpp.o /opt/cray/pe/craype/2.5.12.3/bin/CC -g3 -Wall -Wpointer-arith -Wnested-externs -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/0d295a194ba5f1c0c571149bf1c34eec2c39353e.x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/1cc01d9b85d589859c307a3b49043eca23c19fea.hello_gasnet.cpp.o -L/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/lib -lgasnet-aries-seq -L/opt/cray/pe/pmi/5.0.12/lib64 -L/opt/cray/ugni/6.0.14-6.0.4.0_14.1__ge7db4a2.ari/lib64 -lpmi -lugni -L/opt/cray/udreg/2.3.2-6.0.4.0_12.2__g2f9c3ee.ari/lib64 -ludreg -L/opt/cray/xpmem/2.2.2-6.0.4.0_3.1__g43b0535.ari/lib64 -lxpmem -lhugetlbfs -L/opt/gcc/6.3.0/snos/lib/gcc/x86_64-suse-linux/6.3.0 -lgcc -lm /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/0d295a194ba5f1c0c571149bf1c34eec2c39353e.x cd /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx && source ./sourceme && export CC=/opt/cray/pe/craype/2.5.12.3/bin/cc CXX=/opt/cray/pe/craype/2.5.12.3/bin/CC DBGSYM=1 OPTLEV=0 GASNET="/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include" GASNET_CONDUIT=aries && nobs exe test/hello_upcxx.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_upcxx.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -O0 -g -Wall -g3 -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/test/hello_upcxx.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9a6e036434c99dc59bdbb895863c09c0885385c2.hello_upcxx.cpp.o /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/future/core.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/diagnostic.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/packing.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/backend/gasnet1_seq/backend.cpp /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/future/core.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/99cfc0e5a17d62ceb2546fb420ec9243b6616004.core.cpp.o /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/diagnostic.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/63188edbe17ecbc912f915010464a5c89f5e56ce.diagnostic.cpp.o /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/packing.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/837beb9569f416dc9202474db0f7cfb8ed202d3f.packing.cpp.o /opt/cray/pe/craype/2.5.12.3/bin/CC -std=c++11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -DUPCXX_BACKEND=gasnet1_seq -D_GNU_SOURCE=1 -DGASNET_SEQ -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/include/aries-conduit -DGASNET_CONDUIT_ARIES -O0 -g -Wall -g3 -Wall -Wpointer-arith -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/backend/gasnet1_seq/backend.cpp -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/d43a01bc6d62360cf8727cac572163d02af6ce31.backend.cpp.o /opt/cray/pe/craype/2.5.12.3/bin/cc -std=c11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -MM -MT x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/dl_malloc.c /opt/cray/pe/craype/2.5.12.3/bin/cc -std=c11 -D_GNU_SOURCE=1 -I/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/7e30dad76e40045058f10f5b376e727a60f7a3e9 -O0 -g -Wall -c /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/src/dl_malloc.c -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9d0e67d6b0c330938bc75e7b3608802263df1067.dl_malloc.c.o /opt/cray/pe/craype/2.5.12.3/bin/CC -g3 -Wall -Wpointer-arith -Wnested-externs -Wwrite-strings -Wmissing-format-attribute -Wno-unused -Wno-unused-parameter -Wno-address -o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/888b34fe563ed09b3d6c55dcb35ba9258c25b03e.x /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9d0e67d6b0c330938bc75e7b3608802263df1067.dl_malloc.c.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/63188edbe17ecbc912f915010464a5c89f5e56ce.diagnostic.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/9a6e036434c99dc59bdbb895863c09c0885385c2.hello_upcxx.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/99cfc0e5a17d62ceb2546fb420ec9243b6616004.core.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/837beb9569f416dc9202474db0f7cfb8ed202d3f.packing.cpp.o /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/d43a01bc6d62360cf8727cac572163d02af6ce31.backend.cpp.o -L/global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/gasnet/lib -lgasnet-aries-seq -L/opt/cray/pe/pmi/5.0.12/lib64 -L/opt/cray/ugni/6.0.14-6.0.4.0_14.1__ge7db4a2.ari/lib64 -lpmi -lugni -L/opt/cray/udreg/2.3.2-6.0.4.0_12.2__g2f9c3ee.ari/lib64 -ludreg -L/opt/cray/xpmem/2.2.2-6.0.4.0_3.1__g43b0535.ari/lib64 -lxpmem -lhugetlbfs -L/opt/gcc/6.3.0/snos/lib/gcc/x86_64-suse-linux/6.3.0 -lgcc -lm /global/u1/b/bonachea/UPC/bupcr-ex-edison-gcc/dbg/upcxx/.nobs/art/888b34fe563ed09b3d6c55dcb35ba9258c25b03e.xException in thread Thread-6 (most likely raised during interpreter shutdown):
Mostly harmless, unless the user is relying on the final line of the merged output to be the executable path.
-
Account Deleted This is an unfortunate race condition with using daemon threads in the python interpreter. I'll have to switch to nondaemon threads.
-
reporter Just randomly saw this intermittent error at nobs exit, for a build of 8ac58d2 on Linux that otherwise succeeded. Running the same command a few more times did not reproduce the error.
/home/pcp1/bonachea/UPC/upcxx/.nobs/art/42faf70ac4b59bf76e122d0010f5a7035290e753.xTraceback (most recent call last): File "/home/pcp1/bonachea/UPC/upcxx/nobs/tool.py", line 21, in <module> shutdown() File "/home/pcp1/bonachea/UPC/upcxx/nobs/nobs/async.py", line 322, in shutdown pool.join() File "/home/pcp1/bonachea/UPC/upcxx/nobs/nobs/async.py", line 238, in join for t in me._threads_live: RuntimeError: Set changed size during iteration
-
reporter - changed status to open
The CI tester also just caught an occurrence of this exit-time error on cori on abf40145a4d3df4a4bab9973833b4bd06c3675ce:
Re-opening because this platform matters and it's triggering test failures.
-
reporter - marked as minor
- changed title to intermittent nobs error messages at exit
-
reporter - marked as critical
And another instance of the same intermittent exit failure on sierra
It appears this failure not only prints an ugly error but also causes nobs to returns a non-zero exit code, which makes this more serious.
-
Account Deleted - changed status to resolved
Fixed issue 4: https://bitbucket.org/berkeleylab/upcxx/issues/4/intermittent-nobs-error-messages-at-exit
Fixed missing include in src/collectives.hpp
→ <<cset 33f296798453>>
-
-
assigned issue to
-
assigned issue to
- Log in to comment
Implemented "install" script for generating and installing libupcxx to a given directory. Needs documenting.
Fixed issue
#4.→ <<cset c306072676fe>>