MAGMA CUDA segfault in magma_dgesv_rbt_batched

Issue #75 resolved
Žarko Asen created an issue

Hi guys, my firts report here. I’m having a segfault when I call magma_dgesv_rbt_batched, with an array of pointers to device matrices, even with a batch of size 1, a single matrix. I debugged the problem and it seems like the problem is in magmablas_dprbt_batched(), dgerbt_func_batched.cu:164 to :170 and how magmablas_delementary_multiplication_kernel_batched() is called. NVidia debugger stops at :

[2024-05-08 13:21:25.494688] [0x00007ffe687fd000] [debug]      cuqrsolve.cu:387  [iter_magma_batch_solve] m 2400, n 1, batch size 1

CUDA Exception: Warp Illegal Address
The exception was triggered at PC 0x7ffb77a4edd0 (dgerbt_kernels.cu:90)

Thread 75 "OnlineSVR-test" received signal CUDA_EXCEPTION_14, Warp Illegal Address.
[Switching focus to CUDA kernel 7, grid 5, block (0,0,0), thread (0,2,0), device 1, sm 0, warp 2, lane 0]
magmablas_delementary_multiplication_kernel_batched<<<(19,150,1),(32,4,1)>>> ()
    at /mnt/slowstore/pub/magma/magmablas/dgerbt_kernels.cu:36 in _ZN48_INTERNAL_dfde5553_17_dgerbt_kernels_cu_e8fc078d44magmablas_delementary_multiplication_devfuncEiPdiS0_S0_ inlined from dgerbt_kernels.cu:90
36          dA += idx + idy * ldda;
(cuda-gdb) list
31  
32      idx = blockIdx.x * blockDim.x + threadIdx.x;
33      idy = blockIdx.y * blockDim.y + threadIdx.y;
34  
35      if ((idx < n/2) && (idy < n/2)) {
36          dA += idx + idy * ldda;
37  
38          double a00, a10, a01, a11, b1, b2, b3, b4;
39          __shared__ double u1[block_height], u2[block_height], v1[block_width], v2[block_width];
40  

Hope it's fixed soon. Thanks.

Comments (5)

  1. Ahmad Abdelfattah
    • changed status to open

    Hi,

    Thank you for posting this issue. We have been working on a fix, and will let you know when it is pushed to the main repo.

  2. Žarko Asen reporter

    Hi Ahmed, I’ve been working on a faster asynchronous version of DGESV RBT magma_dgesv_rbt_async would you be able to review and test the code if I post it in a separate branch?

  3. Ahmad Abdelfattah

    Hi Žarko,

    If you would like your new development to be part of MAGMA, it has to support all four standard precisions. I think it is best to create a pull request from a personal fork when you are ready.

    A side note, MAGMA is about to move to GitHub. We are trying to close all PR’s in Bitbucket, so you may want to create a PR on GitHub once the migration is complete.

  4. Log in to comment