2-stage EVP and SVD

#50 Open
Repository
plasma_SEVP
Branch
default
Repository
plasma
Branch
default

Bitbucket cannot automatically merge this request.

The commits that make up this pull request have been removed.

Bitbucket cannot automatically merge this request due to conflicts.

Review the conflicts on the Overview tab. You can then either decline the request or merge it manually on your local system using the following commands:

hg update default
hg pull -r default https://bitbucket.org/psrikara/plasma_sevp
hg merge e1edd4740281
hg commit -m 'Merged in psrikara/plasma_sevp (pull request #50)'
Author
  1. Zounon Mawussi
Reviewers
Description

  • remove additional synchronizations
  • add timing to solve performance issues
  • eigenvalue code fully debug, and static version of bulge chasing added
  • add svd
  • solve bug in zheevd
  • add detailed timing
  • Fortran generation script has been updated; * Header files cleaned up; * Variable 'jobz' renamed to 'eigt'.
  • added fortran tester for symmetric eigenvalues and increased workspace size for symmetric eigenvalues in c
  • added fortran tester for matrix addition
  • merge with maxim modif
  • sgesdd fully working
  • cleaning up gesdd
  • clean up bulge chasing codes
  • add static scheduling version of bulge chasing band to bidiagonal
  • Floating point exception bug corrected
  • zgesdd: merge reduction to band and copy
  • revising 2-stage SVD and EVP and cleaning up the code
  • finish cleanning zgbtype[1,2,3]cb routines
  • rm pzheb2trd
  • change testing routines of zgesdd to reduce time to solution when <--test=m>

Comments (7)

  1. Mark Gates

    I went through and marked some stylistic things to fix.

    There’s a bunch more style inconsistencies that I can easily fix with astyle or perl, so don’t worry about those. E.g.

    if( cond ){
        ...
    } else {
        ...
    }
    

    vs.

    if (cond) {
        ...
    }
    else {
        ...
    }
    

    The second is preferred PLASMA style.

    Eventually it would be nice for the SVD to support doing an initial QR factorization for m >> n, or LQ for m << n. But that code never made it into PLASMA 2.8 either.

    Also eventually it would be good for the SVD to compute "some" vectors, i.e., the reduced or economy-size SVD with U being m-by-min( m, n ) instead of m-by-m, and V being n-by-min( m, n ) instead of n-by-n. In Matlab, that’s [Q, R] = svd( A, 0 ); instead of [Q, R] = svd( A );

    I can probably also handle the merge and renaming some routines, since Piotr renamed the core_blas stuff to fit into xSDK.

    1. Mark Gates

      I added the some vectors support, as it was easy. I also put in TODO comments where the initial QR/LQ would eventually go (future development).

  2. Mark Gates

    SVD crashes if n < nb.

    plasma_sevp/test> ./test dgesdd --dim=100:1000:100
    
      Status      Error       Time    Gflop/s  House. mode     job       m       n    nb    ib  padA
    
    PLASMA ERROR at 193 of plasma_dgesdd() in compute/dgesdd.c: nb < imin(m,n) not supported
      FAILED   9.90e-01     0.0000    81.0949            f       a     100     100   256    64     0
    PLASMA ERROR at 193 of plasma_dgesdd() in compute/dgesdd.c: nb < imin(m,n) not supported
    test(53859,0x7fffa128b3c0) malloc: *** error for object 0x900000000: pointer being freed was not allocated
    *** set a breakpoint in malloc_error_break to debug
    Abort
    

    Is there a reasonable way to fix that? It seems we could just skip reduction to band, since the matrix is smaller than the requested band, and go right to bulge chasing.

    Also, it seems nb < max( m, n ) is not supported, instead of nb < min( m, n ) as claimed in the error message.

    plasma_sevp/test> ./test dgesdd --dim=400x200
    
      Status      Error       Time    Gflop/s  House. mode     job       m       n    nb    ib  padA
    
    PLASMA ERROR at 193 of plasma_dgesdd() in compute/dgesdd.c: nb < imin(m,n) not supported
      FAILED   9.95e-01     0.0000  1444.3289            f       a     400     200   256    64     0
    
    plasma_sevp/test> ./test dgesdd --dim=200x400
    
      Status      Error       Time    Gflop/s  House. mode     job       m       n    nb    ib  padA
    
    PLASMA ERROR at 193 of plasma_dgesdd() in compute/dgesdd.c: nb < imin(m,n) not supported
      FAILED   9.95e-01     0.0000  1565.5313            f       a     200     400   256    64     0
    

    Note the syevd seems to work fine for n < nb.

    plasma_sevp/test> ./test dsyevd --dim=100:1000:100
    
      Status      Error       Time    Gflop/s    uplo  House. mode    eigt       n    nb    ib  padA
    
        pass   2.06e-17     0.0033     0.4047       l            f       v     100   256    64     0
        pass   8.15e-17     0.0033     3.2926       l            f       v     200   256    64     0
        pass   1.27e-16     0.0084     4.3130       l            f       v     300   256    64     0
        pass   2.22e-16     0.0181     4.7439       l            f       v     400   256    64     0
        pass   3.31e-17     0.0326     5.1213       l            f       v     500   256    64     0
        pass   1.76e-16     0.0569     5.0784       l            f       v     600   256    64     0
        pass   1.72e-16     0.0819     5.5960       l            f       v     700   256    64     0
        pass   2.66e-16     0.1177     5.8126       l            f       v     800   256    64     0
        pass   5.17e-17     0.1515     6.4273       l            f       v     900   256    64     0
        pass   2.39e-16     0.1865     7.1617       l            f       v    1000   256    64     0
    
    1. Mark Gates

      On second thought, we can’t skip straight to bulge chasing, because it needs either lower or upper band, not general band. But this still seems like a serious usability issue to resolve.

  3. Mark Gates

    Computing singular values only (job=n) is marked as FAILED, though the error looks fine to me.

    plasma_sevp/test> ./test dgesdd --dim=300:500:100 --job=n,a
    
      Status      Error       Time    Gflop/s  House. mode     job       m       n    nb    ib  padA
    
      FAILED   1.99e-17     0.0192     3.7676            f       n     300     300   256    64     0
      FAILED   2.47e-17     0.0352     4.8516            f       n     400     400   256    64     0
      FAILED   3.99e-17     0.0635     5.2614            f       n     500     500   256    64     0
        pass   1.65e-17     0.0373     1.9350            f       a     300     300   256    64     0
        pass   2.21e-17     0.0720     2.3754            f       a     400     400   256    64     0
        pass   3.95e-17     0.1275     2.6184            f       a     500     500   256    64     0
    

    The eigenvalue routine seems fine computing values only or values & vectors.

    plasma_sevp/test> ./test dsyevd --dim=100:500:100 --eigt=v,w --uplo=l,u
    
      Status      Error       Time    Gflop/s    uplo  House. mode    eigt       n    nb    ib  padA
    
        pass   2.06e-17     0.0016     0.8645       l            f       v     100   256    64     0
        pass   8.15e-17     0.0028     3.8466       l            f       v     200   256    64     0
        pass   1.27e-16     0.0084     4.2930       l            f       v     300   256    64     0
        pass   2.22e-16     0.0180     4.7623       l            f       v     400   256    64     0
        pass   3.31e-17     0.0329     5.0856       l            f       v     500   256    64     0
        pass   2.03e-17     0.0022     0.6071       l            f       w     100   256    64     0
        pass   1.61e-17     0.0057     1.8855       l            f       w     200   256    64     0
        pass   3.35e-17     0.0169     2.1380       l            f       w     300   256    64     0
        pass   1.57e-17     0.0320     2.6805       l            f       w     400   256    64     0
        pass   1.73e-17     0.0561     2.9795       l            f       w     500   256    64     0
        pass   2.06e-17     0.0012     1.1462       u            f       v     100   256    64     0
        pass   8.15e-17     0.0030     3.5979       u            f       v     200   256    64     0
        pass   1.14e-16     0.0078     4.6428       u            f       v     300   256    64     0
        pass   1.49e-16     0.0179     4.7892       u            f       v     400   256    64     0
        pass   4.50e-17     0.0330     5.0616       u            f       v     500   256    64     0
        pass   2.03e-17     0.0017     0.8132       u            f       w     100   256    64     0
        pass   1.61e-17     0.0060     1.8045       u            f       w     200   256    64     0
        pass   2.43e-17     0.0153     2.3673       u            f       w     300   256    64     0
        pass   1.88e-17     0.0309     2.7683       u            f       w     400   256    64     0
        pass   3.08e-17     0.0545     3.0677       u            f       w     500   256    64     0
    
  4. Mark Gates

    The functions plasma_pzlarft_blgtrd and plasma_pzunmqr_blgtrd are used for both SVD and EV. The name trd implies tridiagonal reduction. Should they be renamed _bulge instead of _blgtrd?