- changed status to resolved
Error running HIP/ROCm examples: "hipErrorNoBinaryForGpu: Unable to find code object for all current devices!"
Over at least the past two days, we have seen the following error running hip_vecadd
(in upcxx:example/gpu_vecadd
) and the Kokkos examples (in upcxx-extras:examples/kokkos_{3dhalo,montecarlo}
) on a system with AMD GPUs:
hipErrorNoBinaryForGpu: Unable to find code object for all current devices!
This is an indication that hipcc
did not generate GPU kernels for the GPU architecture(s) detected at run time.
Under "ideal conditions", hipcc
either honors the environment variable HCC_AMDGPU_TARGET
or uses a helper program to determine the GPUs in the compilation host. Either way the proper code generation is performed entirely transparently. However, the use of a helper program fails consistently when there is no GPU (or the wrong GPU) in the compilation host. What we saw recently was a transient condition in which a user monopolizing the GPU on a login node seems to have prevented the helper from probing the GPU. This paragraph is for background, and represents information I don't want our end-users to be burdened with.
This issue is a three-part task to
- Determine "best practices" for configuring an explicit AMD GPU architecture when building (at least)
hip_vecadd
and the two Kokkos examples. - Update the
Makefile
s as necessary to implement these best practices - Update documentation to share these best practices with end-users
My thoughts:
While there is still an issue to be worked out, I am hoping that the advice for the Kokkos examples should be the same for Nvidia and AMD GPUs: set KOKKOS_ARCH
. There is a Slack thread right now in which a resulting undesired linker interaction is being discussed.
In the Makefile
for gpu_vecadd
, the Nvidia case uses NVCCARCH
and NVCCARCH_FLAGS
environment variables. I imagine we will deploy something similar such as HIPCCARCH
, but this needs some discussion (in particular do we need two variables?).
I want to note that the environment approach for gpu_vecadd
with nvcc
is not documented in the corresponding README.md
, but is documented with the cannon
and jac3d
examples in upcxx-extras
. I think that documentation should be cloned to example/gpu_vecadd/README.md
as part of this issue as well (unless there is a reason not to that I am missing).
Note that this task spans both the upcxx
and upcxx-extras
repos, but without any dependency between the two.
Comments (1)
-
- Log in to comment
Addressed in PR 441 and related extras PR 41