Explore chipStar for HIP-over-Level0 on Intel GPUs

Issue #529 open

Dan Bonachea created an issue 2022-02-16

With the forthcoming hip_device feature, UPC++ will have native support for AMD ROCm/HIP devices.

We already have a (currently experimental) mode capable of running our the UPC++ and GASNet HIP support on NVIDIA GPU hardware using the HIP-over-CUDA shim library.

There is also a HIP-over-Level0 library called HIPLZ.

This task is to experiment with that shim library to see if it's sufficient to allow our hip_device support in UPC++ and GASNet to function on Intel GPU hardware.

Comments (10)

Dan Bonachea reporter
- changed component to Memory Kinds
- 2022-02-16T22:43:03+00:00
Dan Bonachea reporter
- changed milestone to 2023.3.0 release
- 2022-10-21T03:49:56+00:00
Dan Bonachea reporter
- assigned issue to
  
  Paul Hargrove
- 2023-03-14T21:50:47+00:00
Paul Hargrove
- changed status to open
I was able to make some progress on my first attempt at this.

TL;DR:
- With four source modifications (one in configure) I can compile and link
- I cannot yet run, but for a reason I believe I can resolve when I am next able to work on this
Full version:

My first hurdle was the fact that the HIPLZ installs I have access lack a hip_runtime_api.h header (having only hip_runtime.h). My understanding is that hip_runtime_api.h from a ROCm install is a wrapper which includes either hcc_detail/hip_runtime_api.h or nvcc_detail/hip_runtime_api.h. There are no *_detail directories in the HIPLZ install. So, as I've stated verbally already, the project's focus appears to be on applications and not on low-level runtimes such as ours (and I can't fault them for that prioritization).

✅ Substituting hip_runtime.h for the three instances of hip_runtime_api.h in UPC++ (one of those in configure) lets me configure and compile the runtime.

Next problem was a link failure on every test because hipMemcpyPeerAsync() is missing. I've confirmed that name is not present in the provided headers, and this is probably not implemented yet.

✅ Flipping a #if 1 in src/hip.cc to avoid use of hipMemcpyPeerAsync() lets me link tests.

Now, as I run out of time for this trial, I get a run time failure to locate libOpenCL.so.1. That is almost certainly a matter of an LD_LIBRARY_PATH to resolve a dependency of one of the explicitly linked libs. Note that in my experience the executable's RPATH is often (always?) ignored when resolving those. But I will try an RPATH before I attempt kludging application launch to propagate a (yet to be determined) LD_LIBRARY_PATH.
- 2023-03-15T18:53:07+00:00
Paul Hargrove
@Dan Bonachea noted that disabling the call to hipMemcpyPeerAsync() in favor of hipMemcpyDtoDAsync() is not actually equivalent for the case of multiple devices. I am currently limiting my testing to nodes with a single Intel GPU device to avoid the potential issue for now.
- 2023-03-15T21:44:05+00:00
Paul Hargrove
Next installment (final for today)

TL;DR:
- I can pass make check for udp and smp conduits, including test-memory_kinds (nodes I am using lack InfiniBand).
- This result required two more source modifications (now at a total of six)
- GPU-enabled tests in make dev-check are NOT all passing, but I lack time to diagnose the failures just now
Full Version:

Now, as I run out of time for this trial, I get a run time failure to locate libOpenCL.so.1

✅ As was noted in my testing of PR~~#473~~, the oneAPI install on the JSLE systems (where I am testing HipLZ) already required a kludge to propagate LD_LIBRARY_PATH to get Level Zero working. So, resolving the missing libOpenCL (from the oneAPI install) was automatic with resolving the Level Zero issue. So, this appears to be a platform issue, not a HipLZ one.

Next issue is that HipStreamDestroy() in hip_device::destroy() leads to an assertion failure within HipLZ. It is worth noting that the default for module load hiplz on the system in use adds paths which contain -debug in the version portion of the directory names.

✅ So, I've commented out that call to HipStreamDestroy() (source mod #5, for those keeping track).

Next up: the intentionally over-large allocation call in memory_kinds.cpp is seeing a return of hipErrorInvalidValue where we expect hipErrorOutOfMemory from hipMalloc(). My understanding of the Hip documentation is that this is probably a conformance error in HipLZ.

✅ Regardless of conformance, I've added a case for hipErrorInvalidValue to the error-handling switch statement in make_segment() in src/hip.cpp. (source mod #6, for those keeping track). This lets me make forward progress, but could mask real errors in the call to hipMalloc().

With the above, I can pass make check for NETWORK=smp or udp. The output of test-memory_kinds confirms an Intel-branded GPU listed as a HIP device.

There is no hipcc provided by HipLZ, and I've not attempted to follow their insrtuctions form use of their clang as a replacement. So, I am ignoring hip_vecadd for now.

Even ignoring hip_vecadd, most GPU-enabled tests in make dev-check are not passing, but I lack time to investigate the failures right now. Most (maybe all) of the faiures appear to be timeouts, though non-GPU tests run just fine.
- 2023-03-15T22:08:21+00:00
Paul Hargrove
I had hoped to continue this work on one of the systems at UOregon with Intel GPUs. However, I have been unable to build HipLZ's prerequisites.

Note for whoever may continue this:
The HipLZ and HipCL projects have merged to become CHIP-SPV
- 2023-03-17T19:48:44+00:00
Dan Bonachea reporter
- changed milestone to 2023.9.0 release
Mass roll-over of open issues to next release milestone
- 2023-04-05T01:23:20+00:00
Dan Bonachea reporter
- removed milestone
Clear past Milestone for open issues
- 2024-01-03T22:22:33+00:00
Dan Bonachea reporter
- changed title to Explore chipStar for HIP-over-Level0 on Intel GPUs
chipStar has replaced HIPLZ as the recommended way to run HIP over Intel GPUs
- 2024-01-19T19:28:06+00:00
Log in to comment

Assignee: Paul Hargrove

Type: task

Priority: minor

Status: open

Component: Memory Kinds

Milestone: –

Version: –

Votes: 0

Watchers: 1

Comments (10)

TL;DR:

Full version:

TL;DR:

Full Version: