Multiple test failures on Titan

Issue #171 wontfix
Paul Hargrove created an issue

Multiple tests (the non-GEX ones) are failing in OLCF's Titan.
This is independent of PrgEnv-gcc vs -intel environment module.
This is independent of gcc/5.3.0, 6.3.0 or 7.3.0 environment module.

This is not reproducible (so far) on Edison, Cori or Theta.
However, similar (distinct) failure has been seen on Theta (but only for gcc/5.3.0).

The stack trace of a failing lpc_barrier with PrgEnv-gnu and gcc/5.3.0:

Core was generated by `./lpc_barrier-par'.
Program terminated with signal 11, Segmentation fault.
#0  0x0000000000000000 in ?? ()
(gdb) where
#0  0x0000000000000000 in ?? ()
#1  0x000000000042cda3 in __gthread_join (__value_ptr=0x0, __threadid=<optimized out>)
    at /b/tmp/peint/build-cray-gcc-20180126.202153.829775000/cray-gcc/BUILD/snos_objdir/x86_64-suse-linux/libstdc++-v3/include/x86_64-suse-linux/bits/gthr-default.h:668
#2  std::thread::join (this=0x7b36e0)
    at ../../../../../cray-gcc-7.3.0-201801270210.d61239fc6000b/libstdc++-v3/src/c++11/thread.cc:136
#3  0x0000000000401d90 in main ()
    at /lustre/atlas2/csc296/scratch/hargrove/upcnightly-titan/EX-titan-gemini-gcc/runtime/work/dbg/upcxx/test/lpc_barrier.cpp:164

However, this fails in the same manner with the following non-UPC++ code:

#include <atomic>
#include <iostream>
#include <thread>

#include <sched.h>

const int thread_n = 8;

int main() {
  std::atomic<int> setup_bar{0};
  auto thread_fn = [&](int me) {
    setup_bar.fetch_add(1);
    while(setup_bar.load(std::memory_order_relaxed) != thread_n)
      sched_yield();
  };

  std::thread* threads[thread_n];
  for(int t=1; t < thread_n; t++)
    threads[t] = new std::thread{thread_fn, t};
  thread_fn(0);

  for(int t=1; t < thread_n; t++) {
    threads[t]->join();
    delete threads[t];
  }

  std::cout << "Done.\n";

  return 0;
}

Comments (2)

  1. Paul Hargrove reporter

    This is a bug in non-upcxx code, reproducible only on a non-supported (gemini-conduit) configuration.

  2. Log in to comment