Explained_variance calculation

Issue #78 resolved
Arnaud created an issue

Hello,

Could you please explain how the explained_variance is calculated ?

In particular, in a 2 X/blocks and 1 Y setting and using the block.pls function in regression mode with ncomp = 2, if I sum the 4 explained_variances (2 per X/block * 2 components), I get a value greater than 1, which puzzles me ...

Does it mean that the explained_variance is calculated for each X/block as if the other X/blocks didn't exist, and not at the global level ?

Thanks in advance, Arnaud

PS : I read the help file for explained_variance and the DIABLO article but couldn't find the explanation ...

Comments (8)

  1. Florian Rohart

    Hello Arnaud,

    The explained_variance is indeed calculated for each block independently.

    The explained_variance of block Z is the percentage of the variance of Z that is explained by the Z-components only. It is an indicator of how much the Z-components sums up the information of the initial Z, since the Z-components (variates$Z) are artificial components in which Z is projected (and only Z).

    Hope that helps

    --

    Florian

  2. Arnaud reporter

    Thanks Florian.

    So if one wants to have the percentage of Y explained by a given X/Z-component (which is what I was really interested in, rather the percentage of a given X/Z block's variance explained by its components), should one just compute the square of the correlation between this X/Z-component (for a given block) and Y ?

    Thanks in advance, Arnaud

  3. Log in to comment