pca.xyz
Wouldn't it be better to have rm.gaps=TRUE
by default in pca.xyz
. currently it's FALSE
:
> attach(transducin)
> pca.xyz(pdbs$xyz)
> pca.xyz(pdbs$xyz)
Error in pca.xyz(pdbs$xyz) : Infinite or missing values in 'xyz' input.
Likely solution is to remove gap positions (cols)
or gap containing structures (rows) from input.
Comments (9)
-
-
I have it set to FASE to remind folks that there could be a decision required on their part before proceeding. Namely, be aware of the subset of positions (or alternatively the subset of structures) that will be analyzed.
Certainly setting it to TRUE will allow one to run through analysis more easily and likely remove positions that folks may not be aware of. It will also likely lead to future confusion with result interpretation. Having it as FALSE also serves as an important 'catch' to highlight potentially unexpected pre-processing leading to NAs in traj data etc.
I think FALSE makes more sense in this case and suggest that we just highlight the option in the corresponding line of the error message: "Likely solution is to remove gap positions (cols) with 'rm.gaps=TRUE'"
-
reporter Good point, but since there is no other option than
rm.gaps=TRUE
then just issuing a warning could be an alternative.Apropos, are you familiar with ways to deal with missing values in PCA? Not sure it would make any sense here though, but there are options for that. In principle, we could have a
na.action
argument equivalent to functionprcomp
. -
Issuing a warning is an alternative - but then if you explicitly want 'rm.gaps=TRUE' why should you have to see a warning about it, thats just annoying, no.
There are multiple ways we could impute this type of missing data prior to PCA. In general, if the missing coords are few and randomly distributed then it is quite straightforward. Systematic blocks of consistently missing data are more problematic and would require more careful consideration. Perhaps we could read-up and adapt some of the methods used here: http://www.bioconductor.org/packages/release/bioc/html/pcaMethods.html and http://menugget.blogspot.de/2012/10/dineof-data-interpolating-empirical.html
-
Also check out the nice features of the PCA() function in the FactoMineR package. The missing data example uses package missMDA.
-
reporter Great. Let's keep this issue as a todo. I think this could make a nice enhancement to the PCA function.
-
reporter - changed version to v2.3 [future]
- marked as task
-
- changed version to v2.3 [devel]
-
- changed version to v2.4/3.0 [future]
- Log in to comment
Yes, I agree. Thanks!