-
assigned issue to
getrf_batched kernel produces NaNs on singular square inputs of size <=32
[reposted from MAGMA forum https://icl.cs.utk.edu/magma/forum/viewtopic.php?f=2&t=4035]
The subject of this message summarizes the issue, here's a reproducer based on pytorch:
CODE: SELECT ALL
>>> import torch
>>> m, n = 3, 3
>>> torch.ones(1, m, n, device='cuda').lu()
(tensor([[[1., 1., 1.],
[1., 0., 0.],
[1., nan, nan]]], device='cuda:0'), tensor([[1, 2, 3]], device='cuda:0', dtype=torch.int32))
Notice the nan entries appear only when m == n and m <= 32, for other cases, the getrf_batched works correctly.
The source of this issue is likely in the kernel functions implemented in magmablas/zgetrf_batched_smallsq_shfl.cu and ./magmablas/zgetrf_batched_smallsq_noshfl.cu .
Best regards,
Pearu
Comments (7)
-
-
I also think that there is an issue in returning the pivots when the matrix is singular:
>>> import torch >>> torch.ones(1, 33, 33, device='cuda').lu()[1] # Returns the pivots tensor([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49]], device='cuda:0', dtype=torch.int32)
-
Thanks for reporting the issue. Can you please pull the latest changes and recheck?
-
- changed status to resolved
-
Hi, one part of the issue is resolved i.e., the issue regarding the NaNs. However, the pivots seemed to be returned incorrectly.
>>> import torch >>> torch.ones(1, 10, 10, device='cuda').lu()[1] # Returns the pivots tensor([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10]], device='cuda:0', dtype=torch.int32) >>> torch.ones(1, 33, 33, device='cuda').lu()[1] # Returns the pivots tensor([[ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44, 45, 46, 47, 48, 49, 50, 51]], device='cuda:0', dtype=torch.int32)
-
Please recheck now and let us know.
-
Thank you for fixing this. I can confirm that the error is no longer present.
- Log in to comment