Explore chipStar for HIP-over-Level0 on Intel GPUs

Issue #529 open
Dan Bonachea created an issue

With the forthcoming hip_device feature, UPC++ will have native support for AMD ROCm/HIP devices.

We already have a (currently experimental) mode capable of running our the UPC++ and GASNet HIP support on NVIDIA GPU hardware using the HIP-over-CUDA shim library.

There is also a HIP-over-Level0 library called HIPLZ.

This task is to experiment with that shim library to see if it's sufficient to allow our hip_device support in UPC++ and GASNet to function on Intel GPU hardware.

Comments (10)

  1. Paul Hargrove
    • changed status to open

    I was able to make some progress on my first attempt at this.

    TL;DR:

    • With four source modifications (one in configure) I can compile and link
    • I cannot yet run, but for a reason I believe I can resolve when I am next able to work on this

    Full version:

    My first hurdle was the fact that the HIPLZ installs I have access lack a hip_runtime_api.h header (having only hip_runtime.h). My understanding is that hip_runtime_api.h from a ROCm install is a wrapper which includes either hcc_detail/hip_runtime_api.h or nvcc_detail/hip_runtime_api.h. There are no *_detail directories in the HIPLZ install. So, as I've stated verbally already, the project's focus appears to be on applications and not on low-level runtimes such as ours (and I can't fault them for that prioritization).

    ✅ Substituting hip_runtime.h for the three instances of hip_runtime_api.h in UPC++ (one of those in configure) lets me configure and compile the runtime.

    Next problem was a link failure on every test because hipMemcpyPeerAsync() is missing. I've confirmed that name is not present in the provided headers, and this is probably not implemented yet.

    ✅ Flipping a #if 1 in src/hip.cc to avoid use of hipMemcpyPeerAsync() lets me link tests.

    Now, as I run out of time for this trial, I get a run time failure to locate libOpenCL.so.1. That is almost certainly a matter of an LD_LIBRARY_PATH to resolve a dependency of one of the explicitly linked libs. Note that in my experience the executable's RPATH is often (always?) ignored when resolving those. But I will try an RPATH before I attempt kludging application launch to propagate a (yet to be determined) LD_LIBRARY_PATH.

  2. Paul Hargrove

    @Dan Bonachea noted that disabling the call to hipMemcpyPeerAsync() in favor of hipMemcpyDtoDAsync() is not actually equivalent for the case of multiple devices. I am currently limiting my testing to nodes with a single Intel GPU device to avoid the potential issue for now.

  3. Paul Hargrove

    Next installment (final for today)

    TL;DR:

    • I can pass make check for udp and smp conduits, including test-memory_kinds (nodes I am using lack InfiniBand).
    • This result required two more source modifications (now at a total of six)
    • GPU-enabled tests in make dev-check are NOT all passing, but I lack time to diagnose the failures just now

    Full Version:

    Now, as I run out of time for this trial, I get a run time failure to locate libOpenCL.so.1

    ✅ As was noted in my testing of PR#473, the oneAPI install on the JSLE systems (where I am testing HipLZ) already required a kludge to propagate LD_LIBRARY_PATH to get Level Zero working. So, resolving the missing libOpenCL (from the oneAPI install) was automatic with resolving the Level Zero issue. So, this appears to be a platform issue, not a HipLZ one.

    Next issue is that HipStreamDestroy() in hip_device::destroy() leads to an assertion failure within HipLZ. It is worth noting that the default for module load hiplz on the system in use adds paths which contain -debug in the version portion of the directory names.

    ✅ So, I've commented out that call to HipStreamDestroy() (source mod #5, for those keeping track).

    Next up: the intentionally over-large allocation call in memory_kinds.cpp is seeing a return of hipErrorInvalidValue where we expect hipErrorOutOfMemory from hipMalloc(). My understanding of the Hip documentation is that this is probably a conformance error in HipLZ.

    ✅ Regardless of conformance, I've added a case for hipErrorInvalidValue to the error-handling switch statement in make_segment() in src/hip.cpp. (source mod #6, for those keeping track). This lets me make forward progress, but could mask real errors in the call to hipMalloc().

    With the above, I can pass make check for NETWORK=smp or udp. The output of test-memory_kinds confirms an Intel-branded GPU listed as a HIP device.

    There is no hipcc provided by HipLZ, and I've not attempted to follow their insrtuctions form use of their clang as a replacement. So, I am ignoring hip_vecadd for now.

    Even ignoring hip_vecadd, most GPU-enabled tests in make dev-check are not passing, but I lack time to investigate the failures right now. Most (maybe all) of the faiures appear to be timeouts, though non-GPU tests run just fine.

  4. Paul Hargrove

    I had hoped to continue this work on one of the systems at UOregon with Intel GPUs. However, I have been unable to build HipLZ's prerequisites.

    Note for whoever may continue this:
    The HipLZ and HipCL projects have merged to become CHIP-SPV

  5. Log in to comment