PLASMA
2.8.0
PLASMA - Parallel Linear Algebra for Scalable Multi-core Architectures
|
void CORE_dbrdalg1 | ( | PLASMA_enum | uplo, |
int | n, | ||
int | nb, | ||
double * | A, | ||
int | lda, | ||
double * | VQ, | ||
double * | TAUQ, | ||
double * | VP, | ||
double * | TAUP, | ||
int | Vblksiz, | ||
int | wantz, | ||
int | i, | ||
int | sweepid, | ||
int | m, | ||
int | grsiz, | ||
double * | work | ||
) |
CORE_dbrdalg1 is a part of the bidiagonal reduction algorithm (bulgechasing). It correspond to a local driver of the kernels that should be executed on a single core.
[in] | uplo |
|
[in] | n | The order of the matrix A. n >= 0. |
[in] | nb | The size of the Bandwidth of the matrix A, which correspond to the tile size. nb >= 0. |
[in,out] | A | double array, dimension (lda,n) On entry, the (2nb+1)-by-n lower or upper band general matrix to be reduced to bidiagonal. On exit, if uplo = PlasmaUpper, the diagonal and first superdiagonal of A are overwritten by the corresponding elements of the bidiagonal matrix B. if uplo = PlasmaLower the diagonal and first subdiagonal of A are overwritten by the corresponding elements of the elements of the bidiagonal matrix B. |
[in] | lda | (input) INTEGER The leading dimension of the array A. LDA >= max(1,nb+1). |
[out] | VQ | double array, dimension (n) if wantz=0 or ldv*Vblksiz*blkcnt if wantz>0. The scalar elementary left reflectors are written in this array. |
[out] | TAUQ | double array, dimension (n) if wantz=0 or Vblksiz*Vblksiz*blkcnt if wantz>0. The scalar factors of the left elementary reflectors are written in this array. |
[in] | VP | double array, dimension (n) if wantz=0 or ldv*Vblksiz*blkcnt if wantz>0. The scalar elementary right reflectors are written in this array. |
[in] | TAUP | double array, dimension (n) if wantz=0 or Vblksiz*Vblksiz*blkcnt if wantz>0. The scalar factors of the right elementary reflectors are written in this array. |
[in] | Vblksiz | Local parameter to Plasma. It correspond to the local bloccking of the applyQ2 used to apply the orthogonal matrix Q2. |
[in] | wantz | integer tobe 0 or 1. if wantz=0 the V and TAU are not stored on only they are kept for next step then overwritten. |
[in] | i | Integer that refer to the current sweep. (outer loop). |
[in] | sweepid | Integer that refer to the sweep to chase.(inner loop). |
[in] | m | Integer that refer to a sweep step, to ensure order dependencies. |
[in] | grsiz | Integer that refer to the size of a group. group mean the number of kernel that should be executed sequentially on the same core. group size is a trade-off between locality (cache reuse) and parallelism. a small group size increase parallelism while a large group size increase cache reuse. |
[in] | work | Workspace of size nb. Used by the core_dgbtype[123]cb. |