getri_outofplace_batched fails when batchCount is > >=65536

Hi,

I am using MAGMA's batched getri operation for batched inverse, but this seems to fail when the number of batches are greater than or equal to 65536.

Below are the outputs from the tests:

Single Precision:

% MAGMA 2.3.0  compiled for CUDA capability >= 6.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 8000, driver 9010. OpenMP threads 40. 
% device 0: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% device 1: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% device 2: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% device 3: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% Sat Nov  3 11:13:45 2018
% Usage: ./testing/testing_sgetri_batched [options] [-h|--help]

% batchCount   N    CPU Gflop/s (ms)    GPU Gflop/s (ms)   ||I - A*A^{-1}||_1 / (N*cond(A))
%===============================================================================
     65535     2     ---   (  ---  )      0.03 (  41.40)   6.15e-08   ok
     65536     2     ---   (  ---  )      0.03 (  32.24)   1.68e+07   failed
     68523     2     ---   (  ---  )      0.03 (  43.34)   1.68e+07   failed

Double Precision:

% MAGMA 2.3.0  compiled for CUDA capability >= 6.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 8000, driver 9010. OpenMP threads 40. 
% device 0: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% device 1: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% device 2: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% device 3: GeForce GTX 1080 Ti, 1582.0 MHz clock, 11178.5 MiB memory, capability 6.1
% Sat Nov  3 11:15:12 2018
% Usage: ./testing/testing_dgetri_batched [options] [-h|--help]

% batchCount   N    CPU Gflop/s (ms)    GPU Gflop/s (ms)   ||I - A*A^{-1}||_1 / (N*cond(A))
%===============================================================================
     65535     2     ---   (  ---  )      0.01 (  81.26)   1.14e-16   ok
     65536     2     ---   (  ---  )      0.02 (  58.56)   9.01e+15   failed
     68523     2     ---   (  ---  )      0.02 (  66.03)   9.01e+15   failed

I passed the option --matrix rand_dominant to ensure that the random matrices generated are not singular by chance.

It would be great if you could provide a solution for this issue or indicate if this is expected behavior. Thank you.

Comments (4)