Following up on Issue #636 and discussion in the corresponding PR #508, this
is a report of a number of developer-only tests which exhibit "bad" behavior
when run on a system lacking a GPU for an enabled memory kind.
The following tests all fail "ungracefully", with an abort(), when UPC++ has                                            been configured for a kind for which no device is detected. A kind appearing
in parentheses indicates a test which is only run when that kind is enabled:
- test-bad-segment-alloc
- test-copy
- test-gpu_microbenchmark
- test-h-d-remote
- test-rpc-ctor-trace
- test-h-d(cuda)
- test-cuda-context(cuda)
- test-cuda_vecadd(cuda)
- test-hip_vecadd(hip)
- test-sycl_vecadd(ze)
- test-ze_device(ze)
There is also an unfortunate case for test-memory_kinds when using the CUDA                                             memory kind: it fails "gracefully" with an exit code of 0.  However, since the
output includes the string cuInit() failed: CUDA_ERROR_NO_DEVICE, a make
dev-check (or similar, including make dev-run-tests) will report the test as
FAILED due on a match on "ERROR".
 
                   
         
                       
                
If the crashes of eleven tests were the only issue, I would not hesitate to resolve this issue as WONTFIX, since these are developer-only tests in which we are willing to accept UB and/or lax error checking. In some case, in fact, we probably want to see failures (such as a means to detect regressions in the non-trivial ZE device enumeration logic).
@Dan Bonachea Your thoughts on addressing the "false alarm" attributable to
CUDA_ERROR_NO_DEVICEin the output fromtest-memory_kinds?Options which come to my mind include, ordered from most to least effort:
WARNING_BANLISTto filterstdoutandstderrfrom runs of teststest-memory_kindsas a distinct issue, to remain open, and then close this issue as WONTFIX