- changed status to resolved
Kokkos_3dhalo example crashes when using multiple HIP streams
The kokkos_3dhalo
example has a different Kokkos::ExecutionSpace
for the computation kernel of each surface in the domain. Unless KOKKOS_ENABLE_DEBUG
is defined, when the application is compiled for GPU execution is creates each of these spaces within a separate CUDA/HIP stream. On OLCF’s Crusher, we’ve noticed that the application crashes with the following call stack:
[1] #0 0x00007fffe6e5dbc9 in ?? () from /opt/rocm-5.1.0/lib/libhsa-runtime64.so.1
[1]
#10x00007fffe6e5da9a in ?? () from /opt/rocm-5.1.0/lib/libhsa-runtime64.so.1[1]
#20x00007fffe6e51a69 in ?? () from /opt/rocm-5.1.0/lib/libhsa-runtime64.so.1[1]
#30x00007fffe955e03b in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5[1]
#40x00007fffe954d59a in ?? () from /opt/rocm-5.1.0/lib/libamdhip64.so.5[1]
#50x00007fffe939f419 in hipDeviceSynchronize () from /opt/rocm-5.1.0/lib/libamdhip64.so.5[1]
#60x000000000046f7ce in Kokkos::Experimental::HIP::impl_static_fence(std::__cxx11::basic_string<char, std:: char_traits<char>, std::allocator<char> > const&) ()[1]
#70x000000000049cfbd in Kokkos::fence(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) ()[1]
#80x000000000040cd61 in System::timestep (this=0x7fffffff6740) at upcxx_heat_conduction.cpp:417[1]
#90x000000000040a47b in main (argc=1, argv=0x7fffffff6de8) at upcxx_heat_conduction.cpp:705
Currently, the plan is to resolve this by having all surface computation kernels happen in the same execution space (and hence the same stream) when UPC++ is configured with HIP enabled. All seven execution spaces still exist in the application for the purpose of overlapping computation of the interior domain’s kernel with communication of the six surface cell data buffers.
Comments (1)
-
- Log in to comment
Fixed in extras PR 43