getrf_batched kernel produces NaNs on singular square inputs of size <=32

Issue #13 resolved
Mark Gates created an issue

[reposted from MAGMA forum https://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=4035]

The subject of this message summarizes the issue, here's a reproducer based on pytorch:

CODE: SELECT ALL

>>> import torch
>>> m, n = 3, 3
>>> torch.ones(1, m, n, device='cuda').lu()
(tensor([[[1., 1., 1.],
         [1., 0., 0.],
         [1., nan, nan]]], device='cuda:0'), tensor([[1, 2, 3]], device='cuda:0', dtype=torch.int32))

Notice the nan entries appear only when m == n and m <= 32, for other cases, the getrf_batched works correctly.

The source of this issue is likely in the kernel functions implemented in magmablas/zgetrf_batched_smallsq_shfl.cu and ./magmablas/zgetrf_batched_smallsq_noshfl.cu .

Best regards,
Pearu

Comments (7)

  1. Vishwak S

    I also think that there is an issue in returning the pivots when the matrix is singular:

    >>> import torch
    >>> torch.ones(1, 33, 33, device='cuda').lu()[1]  # Returns the pivots
    tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 33, 34,
             35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]],
           device='cuda:0', dtype=torch.int32)
    

  2. Vishwak S

    Hi, one part of the issue is resolved i.e., the issue regarding the NaNs. However, the pivots seemed to be returned incorrectly.

    >>> import torch
    >>> torch.ones(1, 10, 10, device='cuda').lu()[1]  # Returns the pivots
    tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10]], device='cuda:0',
           dtype=torch.int32)
    >>> torch.ones(1, 33, 33, device='cuda').lu()[1]  # Returns the pivots
    tensor([[ 1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11, 12, 13, 14, 15, 16, 17, 35,
             36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]],
           device='cuda:0', dtype=torch.int32)
    

  3. Log in to comment