magma_dsytrs_gpu returns wrong solution on AMD machine with HIP

Issue #57 resolved
Nai-Yuan Chiang created an issue

Magma version: master

I passed the same matrix and right hand side to function magma_dsytrf_gpu and magma_dsytrs_gpu on different platforms, one with HIP and the others with CUDA. However, the solution I received from `magma_dsytrs_gpu` with HIP is totally different from the others.

I saved the matrix and vectors in a matlab file (attached), in which I made different comparisons, e.g., solutions from matlab backslash and the solutions obtained from HIP and CUDA.

@Cosmin Petra @Stan Tomov

Comments (9)

  1. Stanimire Tomov

    Looks like the CUDA solution is the correct one - at least based on the values in your file - and the hip one is wrong. In theory CUDA and HIP versions must be the same, and If there is difference, it could be coming from the different BLAS that we use (cuBLAS vs hipBLAS). I will check the HIP version and follow up.

  2. Stanimire Tomov

    A small update here not to forger what was checked - as have to do something else now - the magma_dsytrs_gpu routine works correctly for CUDA but has a bug for HIP in real arithmetic and when using the lower triangular part, although both use the same code base. The HIP is correct if the matrix is stored in the upper triangular part, and also for the complex arithmetic versions (single and double).

  3. Nai-Yuan Chiang reporter

    @Stan Tomov Got it! I think that is the problem. The matrix is saved in the upper triangular part in C++, and we pass it directly to magma as a lower triangular matrix in Fortran, i.e., we use MagmaLower in magma_dsytrs_gpu.

  4. Stanimire Tomov

    That’s great if you can use the upper part to solve the problem for now!
    But we still have a problem in HIP if the matrix is stored in the lower part. It is very strange problem because we don’t see this in CUDA and the code is the same. It will take some time to find from which kernel is this coming. I saw that we pass internally MagmaConjTranspose to some hip BLAS functions even though in real arithmetic we just need MagmaTranspose. Usually this is fine but we had detected at least one hipBLAS routine where this is not the case (syr2k). I thought this is the problem here as well (as complex works fine) so changed it manually just to check, but this didn’t fix the problem.

  5. Nai-Yuan Chiang reporter

    Thanks for all the details. Indeed this is very strange. Hope this problem can be fixed soon! Cheers!

  6. Cosmin Petra

    Thank you Stan! We put in a temporary fix to give the other triangle to magma and it seems it is working. Please let us know when you have a resolution on your end.

  7. Nai-Yuan Chiang reporter

    update: after we use MagmaLower in magma_dsytrs_gpu, we also need to use it inmagmablas_dsiinertia.

  8. Ahmad Abdelfattah

    Hi,
    We are making a sweep over the lingering issues in MAGMA, and I came across this one. I could not reproduce the issue mentioned here, so I wanted to check if you are still experiencing problems using magma_dsytrs_gpu.

    I’m attaching a minimal reproducer in C++, where I used sol_cuda in the Matlab script as the reference solution. I ran the reproducer using CUDA 12.1 (A100 GPU), and ROCM-5.7.1 (MI210 GPU). Both lower and upper cases gave correct answers.

  9. Ahmad Abdelfattah

    Hi,

    I will mark this as resolved since I cannot reproduce the issue using the tester I attached before. Feel free to reopen it if you still run into errors for AMD GPUs.

  10. Log in to comment