magma_dgetrf_native illegal memory access, with zero pivot

Issue #30 resolved
Pieter Ghysels created an issue

I’m calling magma_dgetrf_native on a singular matrix. But I’m still interested in the result, for use in a preconditioner.

However, the code crashes with

CUBLAS error: memory mapping error (11) in magma_dgetrf_gpu_expert at /home/pieterg/local/magma-2.5.4/src/dgetrf_gpu.cpp:157
CUBLAS error: memory mapping error (11) in magma_dgetrf_gpu_expert at /home/pieterg/local/magma-2.5.4/src/dgetrf_gpu.cpp:158
CUDA runtime error: an illegal memory access was encountered (700) in magma_dgetrf_gpu_expert at /home/pieterg/local/magma-2.5.4/src/dgetrf_gpu.cpp:333
CUDA runtime error: an illegal memory access was encountered (700) in magma_dgetrf_gpu_expert at /home/pieterg/local/magma-2.5.4/src/dgetrf_gpu.cpp:334
CUDA runtime error: an illegal memory access was encountered (700) in magma_dgetrf_gpu_expert at /home/pieterg/local/magma-2.5.4/src/dgetrf_gpu.cpp:336
CUDA runtime error: an illegal memory access was encountered (700) in magma_dgetrf_gpu_expert at /home/pieterg/local/magma-2.5.4/src/dgetrf_gpu.cpp:345
CUDA runtime error: an illegal memory access was encountered (700) in main at test.cpp:31

It works with magma_dgetrf_gpu.

I’ve attached an example.

Comments (6)

  1. Stan Tomov

    Hi Pieter,

    Thanks for reporting this issue. We managed to reproduced it and will follow up.

    (The magma_dgetrf_gpu computes the panels on the CPU, so it gets it right from the CPU code (LAPACK), but the native is doing it on the GPU, transferring back and forth between CPU and GPU memory the info and ipiv. I suspect we have a bug there - at least the reported errors are associated with the GPU copies and freeing the memory for them at the end. )

    Stan

  2. Pieter Ghysels reporter

    Hi Ahmad, Stan,

    I just checked with the latest git commit in master, and it works now. Thank you!

    Sorry for the delay.

    Pieter

  3. Pieter Ghysels reporter

    Sorry, I think it’s not completely resolved. The illegal memory access error is gone, but it looks like the results are not correct. I will investigate further.

  4. Log in to comment