PLASMA
2.8.0
PLASMA - Parallel Linear Algebra for Scalable Multi-core Architectures
|
void CORE_ctrdalg1 | ( | int | n, |
int | nb, | ||
PLASMA_Complex32_t * | A, | ||
int | lda, | ||
PLASMA_Complex32_t * | V, | ||
PLASMA_Complex32_t * | TAU, | ||
int | Vblksiz, | ||
int | wantz, | ||
int | i, | ||
int | sweepid, | ||
int | m, | ||
int | grsiz, | ||
PLASMA_Complex32_t * | work | ||
) |
CORE_ctrdalg1 is a part of the tridiagonal reduction algorithm (bulgechasing) It correspond to a local driver of the kernels that should be executed on a single core.
[in] | n | The order of the matrix A. n >= 0. |
[in] | nb | The size of the Bandwidth of the matrix A, which correspond to the tile size. nb >= 0. |
[in,out] | A | PLASMA_Complex32_t array, dimension (lda,n) On entry, the (nb+1)-by-n band lower hermetian matrix to be reduced to tridiagonal. On exit, the diagonal and first subdiagonal of A are over- written by the corresponding elements of the tridiagonal. |
[in] | lda | (input) INTEGER The leading dimension of the array A. LDA >= max(1,nb+1). |
[out] | V | PLASMA_Complex32_t array, dimension (n) if wantz=0 or ldv*Vblksiz*blkcnt if wantz>0. The scalar elementary reflectors are written in this array. |
[out] | TAU | PLASMA_Complex32_t array, dimension (n) if wantz=0 or Vblksiz*Vblksiz*blkcnt if wantz>0. The scalar factors of the elementary reflectors are written in this array. |
[in] | Vblksiz | Local parameter to Plasma. It correspond to the local bloccking of the applyQ2 used to apply the orthogonal matrix Q2. |
[in] | wantz | integer tobe 0 or 1. if wantz=0 the V and TAU are not stored on only they are kept for next step then overwritten. |
[in] | i | Integer that refer to the current sweep. (outer loop). |
[in] | sweepid | Integer that refer to the sweep to chase.(inner loop). |
[in] | m | Integer that refer to a sweep step, to ensure order dependencies. |
[in] | grsiz | Integer that refer to the size of a group. group mean the number of kernel that should be executed sequentially on the same core. group size is a trade-off between locality (cache reuse) and parallelism. a small group size increase parallelism while a large group size increase cache reuse. |
[in] | work | Workspace of size nb. Used by the core_chbtype[123]cb. |