- edited description
magmablas_cher2k() might lead to failure of cheevd_gpu
Hi,
When testing cheevd_gpu on ROCm 4.2, we found magmablas_cher2k() in magma_cher2k() might lead to inaccurate results while using hipblasCher2k() makes the tests PASS. We suggest to consider adopting hipblasCher2k() for magma_cher2k().
On aed4e285084763113ce5757393d4008e27b5194b@master branch, we have the following failures:
root@01821ad0c545:~/magma# ./testing/testing_cheevd_gpu --version 1 -L -JV -c -n 3000 -n 4000
\% MAGMA 2.6.0 svn 32-bit magma_int_t, 64-bit pointer.
\% HIP runtime 3275, driver 327500. OpenMP threads 72.
\% device 0: Device 738c, 1502.0 MHz clock, 32752.0 MiB memory, capability 9.0
\% Fri Jul 2 21:08:30 2021
\% Usage: ./testing/testing_cheevd_gpu [options] [-h|--help]
\% jobz = Vectors needed, uplo = Lower, version = 1 (cheevd_gpu)
\% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
\%============================================================================
3000 --- 10.4894 --- 2.90e-06 6.43e-08 failed
4000 --- 1.7848 --- 1.97e-06 6.85e-08 failed
root@01821ad0c545:~/magma# ./testing/testing_cheevd_gpu --version 1 -U -JV -c -n 4000
\% MAGMA 2.6.0 svn 32-bit magma_int_t, 64-bit pointer.
\% HIP runtime 3275, driver 327500. OpenMP threads 72.
\% device 0: Device 738c, 1502.0 MHz clock, 32752.0 MiB memory, capability 9.0
\% Fri Jul 2 21:08:33 2021
\% Usage: ./testing/testing_cheevd_gpu [options] [-h|--help]
\% jobz = Vectors needed, uplo = Upper, version = 1 (cheevd_gpu)
\% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
\%============================================================================
4000 --- 13.6010 --- 1.93e-06 6.61e-08 failed
The change of enabling hipblasCher2k() in magma_cher2k() makes the cases PASS.
(base) root@01821ad0c545:~/magma# ./testing/testing_cheevd_gpu --version 1 -L -JV -c -n 3000 -n 4000
\% MAGMA 2.6.0 svn 32-bit magma_int_t, 64-bit pointer.
\% HIP runtime 402, driver 330500. OpenMP threads 72.
\% device 0: Device 738c, 1502.0 MHz clock, 32752.0 MiB memory, capability 9.0
\% Fri Jul 2 20:47:29 2021
\% Usage: ./testing/testing_cheevd_gpu [options] [-h|--help]
\% jobz = Vectors needed, uplo = Lower, version = 1 (cheevd_gpu)
\% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
\%============================================================================
3000 --- 13.4700 --- 1.78e-06 6.12e-08 ok
4000 --- 1.6325 --- 1.17e-06 6.66e-08 ok
(base) root@01821ad0c545:~/magma# ./testing/testing_cheevd_gpu --version 1 -U -JV -c -n 4000
\% MAGMA 2.6.0 svn 32-bit magma_int_t, 64-bit pointer.
\% HIP runtime 402, driver 330500. OpenMP threads 72.
\% device 0: Device 738c, 1502.0 MHz clock, 32752.0 MiB memory, capability 9.0
\% Fri Jul 2 20:49:09 2021
\% Usage: ./testing/testing_cheevd_gpu [options] [-h|--help]
\% jobz = Vectors needed, uplo = Upper, version = 1 (cheevd_gpu)
\% N CPU Time (sec) GPU Time (sec) |S-S_magma| |A-USU^H| |I-U^H U|
\%============================================================================
4000 --- 15.1728 --- 1.07e-06 6.69e-08 ok
Comments (6)
-
reporter -
Thanks for this report!
I just made the update including a few other BLAS functions that are now in hipBLAS.
Regarding testing cheevd_gpu we have an error tolerance set that sometimes gives innocuous failures. This looks like one of these cases, as the reported errors are almost the same. It must be a difference due to some roundoff errors.
-
reporter Thanks for your feedbacks.
I found hipblasZtrmm is still disabled @749d0411f90e8d7f446fae3cd9059bf81735b46e
We are driving the enablement of hipblas on MAGMA, and thus I would like to know what is the criteria of enabling this function.
Please tell me if you could share test set which shall PASS before enabling hipblasZtrmm.
Thanks
-
Yes, this is the last one left and I was going to ask about it if you know any pending optimizations in hipBLAS. The only reason for now not to define it as the magma_ztrmm is that performance is low. In general we define magma_[blas] to whatever we have fastest in the majority of cases and later, in specific files for specific uses (block sizes, etc.), if there is a faster version we will redefine locally in the file.
This is what I get currently on one of our systems in terms of performance.
-bash-4.2$ ./testing_dtrmm % MAGMA 2.6.0 svn 32-bit magma_int_t, 64-bit pointer. % HIP runtime 3212, driver 321200. OpenMP threads 32. % device 0: Vega 20 [Radeon VII], 1801.0 MHz clock, 16368.0 MiB memory, capability 9.0 % device 1: Vega 20 [Radeon VII], 1801.0 MHz clock, 16368.0 MiB memory, capability 9.0 % device 2: Vega 20 [Radeon VII], 1801.0 MHz clock, 16368.0 MiB memory, capability 9.0 % device 3: Vega 20 [Radeon VII], 1801.0 MHz clock, 16368.0 MiB memory, capability 9.0 % Tue Jul 6 11:06:02 2021 % Usage: ./testing_dtrmm [options] [-h|--help] % If running lapack (option --lapack), MAGMA and HIP errors are both computed % relative to CPU BLAS result. Else, MAGMA error is computed relative to HIP result. % side = Left, uplo = Lower, transA = No transpose, diag = Non-unit % M N MAGMA Gflop/s (ms) HIP Gflop/s (ms) CPU Gflop/s (ms) MAGMA error HIP error %=================================================================================================== 1088 1088 0.32 (4036.23) 4.61 ( 279.21) --- ( --- ) 9.57e-20 --- ok 2112 2112 1378.49 ( 6.83) 9.62 ( 979.71) --- ( --- ) 6.08e-20 --- ok 3136 3136 1798.19 ( 17.15) 14.43 (2137.59) --- ( --- ) 4.83e-20 --- ok 4160 4160 2312.08 ( 31.14) 19.22 (3744.74) --- ( --- ) 3.05e-20 --- ok 5184 5184 2724.87 ( 51.13) 24.12 (5775.80) --- ( --- ) 2.79e-20 --- ok 6208 6208 2484.00 ( 96.32) 28.94 (8268.41) --- ( --- ) 2.49e-20 --- ok
Thanks,
Stan
-
reporter Currently, I could see the Xtrmm performance of HIP blas path is competitive in most of the major cases as following summary on MI100.
Could you help enlighten us the performance targets of enabling hipblas path on Xtrmm?
-
- changed status to resolved
I see this issue has been inactive for quite some time. The TRMM routines from MAGMA and hipBLAS are performing similarly, and both are now Available in MAGMA. I will mark this as resolved.
- Log in to comment