pca.xyz

Issue #197 new
Lars Skjærven created an issue

Wouldn't it be better to have rm.gaps=TRUE by default in pca.xyz. currently it's FALSE:

> attach(transducin)             
> pca.xyz(pdbs$xyz)

> pca.xyz(pdbs$xyz)
Error in pca.xyz(pdbs$xyz) :   Infinite or missing values in 'xyz' input.
         Likely solution is to remove gap positions (cols)
         or gap containing structures (rows) from input.

Comments (9)

  1. Barry Grant

    I have it set to FASE to remind folks that there could be a decision required on their part before proceeding. Namely, be aware of the subset of positions (or alternatively the subset of structures) that will be analyzed.

    Certainly setting it to TRUE will allow one to run through analysis more easily and likely remove positions that folks may not be aware of. It will also likely lead to future confusion with result interpretation. Having it as FALSE also serves as an important 'catch' to highlight potentially unexpected pre-processing leading to NAs in traj data etc.

    I think FALSE makes more sense in this case and suggest that we just highlight the option in the corresponding line of the error message: "Likely solution is to remove gap positions (cols) with 'rm.gaps=TRUE'"

  2. Lars Skjærven reporter

    Good point, but since there is no other option than rm.gaps=TRUE then just issuing a warning could be an alternative.

    Apropos, are you familiar with ways to deal with missing values in PCA? Not sure it would make any sense here though, but there are options for that. In principle, we could have a na.action argument equivalent to function prcomp.

  3. Barry Grant

    Issuing a warning is an alternative - but then if you explicitly want 'rm.gaps=TRUE' why should you have to see a warning about it, thats just annoying, no.

    There are multiple ways we could impute this type of missing data prior to PCA. In general, if the missing coords are few and randomly distributed then it is quite straightforward. Systematic blocks of consistently missing data are more problematic and would require more careful consideration. Perhaps we could read-up and adapt some of the methods used here: http://www.bioconductor.org/packages/release/bioc/html/pcaMethods.html and http://menugget.blogspot.de/2012/10/dineof-data-interpolating-empirical.html

  4. Barry Grant

    Also check out the nice features of the PCA() function in the FactoMineR package. The missing data example uses package missMDA.

  5. Lars Skjærven reporter

    Great. Let's keep this issue as a todo. I think this could make a nice enhancement to the PCA function.

  6. Log in to comment