When matrix is small (CPU only), use zheevx to correctly handle ranged cases.
Use magma malloc and free
Also change fallbacks to lapack zhe/dsyevx for multi-GPU routines zhe/dsyevdx_m
Use magma_int_t instead of int
Also apply the changes to zheevdx_2stage_m
Add LAPAKC test routines [zc]he/[ds]syt22.f
Change the testers to check |U'AU-S| when not all eigenvectors were computed.
Allocate rwork and iwork for zheevx.
Apply the changes to testing_zheevd_gpu as well.
Allocate iwork for dsyevdx fallback calls.
Allocate bigger rwork and iwork for LAPACK zheevx tests.

This is brought up by issue ~~#28~~.

And there are mainly 2 issues:

For the D&C eigensolvers, if n<128, the fallback call to LAPACK was using zheevd which will always compute all eigenvalues even if only a numerical or index range of eigenvalues are requested. This pull request is changing them to always use zheevx so it can handle ranged case correctly. Effecting zheevdx[_2stage][_m] and other precisions. zheevx always need 7*N rwork and 5*N iwork workspaces. It’s allocated right before the call and the matrix is small so it’s not relevant.
The testers are also not checking the ranged cases correctly. It is always checking the whole spectrum even if only part of it is requested. Fixing them by adding LAPACK routine zhet22. If Nfound<N, it will check | U^H A U - S | / ( |A| m ) instead of | A - U S U^H | / ( |A| n ). The eigenvalue check against LAPACK is also using zheevx and compare first Nfound eigenvalues. The checks in testing_zheevdx_2stage is also changed to use either zhet21 or zhet22.

‌

Before:

srun -w a04 ./testing_zheevd --version 2 -n 200 -c --lapack -JV --irange 100,200
% MAGMA 2.5.3 svn compiled for CUDA capability >= 6.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 11000, driver 11000. OpenMP threads 40. MKL 2020.0.1, MKL threads 20.
% device 0: Tesla V100-PCIE-16GB, 1380.0 MHz clock, 16160.5 MiB memory, capability 7.0
% Fri Sep 18 12:59:00 2020
% Usage: /home/ytsai2/magma/ref/magma/testing/./testing_zheevd [options] [-h|--help]

% jobz = Vectors needed, uplo = Lower, ngpu = 1
%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|   |A-USU^H|   |I-U^H U|
%============================================================================
irange (100, 200)
  200      0.0079           0.0190         5.00e-03      5.00e-01    5.00e-01   failed

After:

srun -w a04 ./testing_zheevd --version 2 -n 200 -c --lapack -JV --irange 100,200
% MAGMA 2.5.3 svn compiled for CUDA capability >= 6.0, 32-bit magma_int_t, 64-bit pointer.
% CUDA runtime 11000, driver 11000. OpenMP threads 40. MKL 2020.0.1, MKL threads 20.
% device 0: Tesla V100-PCIE-16GB, 1380.0 MHz clock, 16160.5 MiB memory, capability 7.0
% Fri Sep 18 13:01:38 2020
% Usage: /home/ytsai2/magma/magma/testing/./testing_zheevd [options] [-h|--help]

% jobz = Vectors needed, uplo = Lower, ngpu = 1
%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|   |A-USU^H|   |I-U^H U|
%============================================================================
irange (100, 200)
  200      0.0325           0.0211         8.41e-18      3.23e-17    6.03e-17   ok

Right now the table header is still showing |A-USU^H|. Should we keep it or make it something like this?

%   N   CPU Time (sec)   GPU Time (sec)   |S-S_magma|     |A-USU^H|   |I-U^H U|
%                                                      or |U^HSU-A|
%============================================================================

It is touching much more code than I anticipated. Greatly appreciated if someone can take a closer look to make sure I didn’t miss anything.