dgesv_batched / dgetrs_batched fails for combination [batchCount, N, nrhs] = [1, >1025, >1025]

Issue #19 new
Vishwak S created an issue

Hi,

The existing implementation of the dgesv_batched and dgetrs_batched fail for stated combination of parameters in the tests. From testing/testing_dgesv_batched:

% BatchCount   N  NRHS   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||B - AX|| / N*||A||*||X||
%============================================================================================
         1  1025  1025     ---   (  ---  )     16.12 (   0.18)   1.26e-07   failed

This is causing downstream problems in PyTorch as referenced here: https://github.com/pytorch/pytorch/issues/36921

Comments (3)

  1. Mark Gates

    Please include the complete input & output of the tester, and some context about what platform you are running this on (MAGMA version, CUDA version, BLAS/LAPACK library, Linux/macOS/Windows, etc.). This aides in reproducing problems.

  2. Vishwak S reporter

    The complete input and output is given here:

    $ ./testing_dgesv_batched -N 1025 --nrhs 1025 --batch 1
    
    % MAGMA 2.5.3 svn compiled for CUDA capability >= 5.0, 32-bit magma_int_t, 64-bit pointer.
    % CUDA runtime 10000, driver 10010. OpenMP threads 4. 
    % device 0: GeForce 940M, 1176.0 MHz clock, 2004.5 MiB memory, capability 5.0
    % Tue Apr 21 14:41:45 2020
    % Usage: ./testing_dgesv_batched [options] [-h|--help]
    
    % BatchCount   N  NRHS   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||B - AX|| / N*||A||*||X||
    %============================================================================================
             1  1025  1025     ---   (  ---  )     13.24 (   0.22)   1.26e-07   failed
    

    Running with cuda-memcheck reveals an invalid configuration argument error, which probably indicates that the last error is not checked.

    $ cuda-memcheck ./testing_dgesv_batched -N 1025 --nrhs 1025 --batch 1
    
    ========= CUDA-MEMCHECK
    % MAGMA 2.5.3 svn compiled for CUDA capability >= 5.0, 32-bit magma_int_t, 64-bit pointer.
    % CUDA runtime 10000, driver 10010. OpenMP threads 4. 
    % device 0: GeForce 940M, 1176.0 MHz clock, 2004.5 MiB memory, capability 5.0
    % Tue Apr 21 14:41:53 2020
    % Usage: ./testing_dgesv_batched [options] [-h|--help]
    
    % BatchCount   N  NRHS   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||B - AX|| / N*||A||*||X||
    %============================================================================================
    ========= Program hit cudaErrorInvalidConfiguration (error 9) due to "invalid configuration argument" on CUDA API call to cudaLaunchKernel. 
    =========     Saved host backtrace up to driver entry point at error
    =========     Host Frame:/usr/lib/x86_64-linux-gnu/libcuda.so.1 [0x390513]
    =========     Host Frame:/usr/local/cuda-10.0/lib64/libcudart.so.10.0 (cudaLaunchKernel + 0x265) [0x4e405]
    =========     Host Frame:/media/vishwak/Official-1/magma-src/lib/libmagma.so (_Z59__device_stub__Z31dlaswp_rowserial_kernel_batchediPPdiiiPPiiPPdiiiPPi + 0x14a) [0x5ec70a]
    =========     Host Frame:/media/vishwak/Official-1/magma-src/lib/libmagma.so (magma_dlaswp_rowserial_batched + 0xad) [0x5ec7dd]
    =========     Host Frame:/media/vishwak/Official-1/magma-src/lib/libmagma.so (magma_dgetrs_batched + 0x3ca) [0x48befa]
    =========     Host Frame:./testing_dgesv_batched [0x2bbd]
    =========     Host Frame:/lib/x86_64-linux-gnu/libc.so.6 (__libc_start_main + 0xeb) [0x26b6b]
    =========     Host Frame:./testing_dgesv_batched [0x34fa]
    =========
             1  1025  1025     ---   (  ---  )      0.77 (   3.73)   1.26e-07   failed
    ========= ERROR SUMMARY: 1 error
    

    OS: Linux Ubuntu 19.04, BLAS used is OpenBLAS (OpenBLAS 0.2.20dev)

  3. Log in to comment