testing_dpotrf_mgpu of MAGMA v2.5.4 returned unknown error if ngpus >= 4

Issue #41 new
Mohammed Al Farhan created an issue

Note, this error happens only with MAGMA v2.5.4, however, with MAGMA 2.5.3 testing_dpotrf_mgpu runs perfectly good.

-bash-4.2$ ./testing_dpotrf_mgpu -n 10304 --ngpu 8
% MAGMA 2.5.4 svn 32-bit magma_int_t, 64-bit pointer.
Compiled with CUDA support for 7.0
% CUDA runtime 11020, driver 11020. OpenMP threads 80. MKL 2020.0.3, MKL threads 40. 
% device 0: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 1: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 2: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 3: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 4: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 5: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 6: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% device 7: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
% Mon Mar 22 03:16:05 2021
% Usage: ./testing_dpotrf_mgpu [options] [-h|--help]

% ngpu = 8, uplo = Lower
%   N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||R||_F / ||A||_F
%================================================================
magma_dpotrf_mgpu returned error 8193: function-specific error, see documentation.
10304     ---   (  ---  )   1991.76 (   0.18)     ---

% ngpu = 8, uplo = Lower
%   N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||R||_F / ||A||_F
%================================================================
magma_dpotrf_mgpu returned error 7169: function-specific error, see documentation.
30464     ---   (  ---  )   21404.17 (   0.44)     ---
magma_dpotrf_mgpu returned error 8193: function-specific error, see documentation.
30464     ---   (  ---  )   21272.33 (   0.44)     ---

Comments (2)

  1. Stanimire Tomov

    Hi Mohammed,

    I reproduced this bug. I ran the test many times and sometimes it will pass, sometimes not, just like in your observation. I looked also back to see the changes from 2.5.3 but couldn’t find any. It looks like we have not touched this routine for a very long time - way before even 2.5.3. It is possible that it was working for you just like 2.5.4 was working for me sometimes.

    This may be due to some race condition when we do the look-ahead and use the same memory on the CPU to do the panels. One fix is to make h = 2 on line 143 in file dpotrf_mgpu.cpp.

    We will investigate more why this happened. We use the CPU just for the diagonal blocks now, so work space should be ideally just nb^2. With h=2, now we use 2*nb*n. I think at some point we had versions that did the entire panel on the CPU (not just the diagonal block) and something may have left from those versions. I will close the issue when we fix this workspace size.

    Stan

  2. Mohammed Al Farhan reporter

    Hi Stan,

    Sure, setting h = 2 in zpotrf_mgpu.cpp:143 works. Thanks.

    ./testing_dpotrf_mgpu -n 30464 -N 30464 --ngpu 8
    % MAGMA 2.5.4 svn 32-bit magma_int_t, 64-bit pointer.
    Compiled with CUDA support for 7.0
    % CUDA runtime 11020, driver 11000. OpenMP threads 80. MKL 2020.0.3, MKL threads 40. 
    % device 0: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 1: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 2: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 3: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 4: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 5: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 6: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % device 7: Tesla V100-SXM2-32GB, 1530.0 MHz clock, 32510.5 MiB memory, capability 7.0
    % Sun Mar 28 02:08:53 2021
    % Usage: ./testing_dpotrf_mgpu [options] [-h|--help]
    
    % ngpu = 8, uplo = Lower
    %   N   CPU Gflop/s (sec)   GPU Gflop/s (sec)   ||R||_F / ||A||_F
    %================================================================
    30464     ---   (  ---  )   7212.83 (   1.31)     ---
    30464     ---   (  ---  )   7238.73 (   1.30)     ---
    

    Thank you Stan once again.

    Mohammed

  3. Log in to comment