My matrice of DEGs is singular
Dear all,
I encountered the same trouble as reported by Nikolay Oskolkov (issue #105) trough the plsda function and got the same error:
Error in solve.default(Sr) : system is computationally singular: reciprocal condition number = 1.17735e-18 In addition: Warning message: In internal_wrapper.mint(X = X, Y = Y.mat, ncomp = ncomp, scale = scale, : At least one study has less than 5 samples, mean centering might not do as expected
If I understood well, that means that my genes have highly correlated expression. X is 657 DEGs and Y is Y<-as.factor(c(rep("CTRL",4),rep("TESTO",4))). What should I do with my X ? I tried folds=4 or 3 or 2 with no success.... Thanks for your help Regards Carine perf.plsda <- perf(res.plsda, validation = "Mfold", folds = 3, + progressBar = TRUE, auc = TRUE, nrepeat = 2)
Comments (4)
-
-
reporter Bonjour Florian, Merci pour cette solution, cela marche. J'avais oublié cette méthode de validation. En poursuivant les analyses, mon perf.plsda$error.rate est égal à 0. Cela veut dire que je ne peux pas faire d'erreur ? Ensuite mon ncomp est égal à NULL donc pour poursuivre les analyses je le mets à 2. Est ce valable ? Encore merci pour ton aide, Ci dessous mon code. head(perf.plsda$error.rate) $overall max.dist centroids.dist mahalanobis.dist comp 1 0 0 0 comp 2 0 0 0 comp 3 0 0 0
$BER max.dist centroids.dist mahalanobis.dist comp 1 0 0 0 comp 2 0 0 0 comp 3 0 0 0
plot(perf.plsda, overlay='measure', sd=TRUE) plot(perf.plsda, col = color.mixo(5:7), sd = TRUE, legend.position = "horizontal") list.keepX <- c(1:10, seq(20, 50, 10)) tune.splsda <- tune.splsda(X, Z, ncomp = 3, validation = 'Mfold', folds = 4, + progressBar = TRUE, dist = 'max.dist', measure = "BER", + test.keepX = list.keepX, nrepeat = 10, cpus = 2) As code is running in parallel, the progressBar will only show 100% upon completion of each component.
comp 1 |=============================================================================| 100% comp 2 |=============================================================================| 100% comp 3 |=============================================================================| 100%
error <- tune.splsda$error.rate ncomp <- tune.splsda$choice.ncomp$ncomp # optimal number of components based on t-tests ncomp # donne le nombre optimum ici NULL
-
Dear Carine,
At the moment, the tune.splsda$choice.ncomp$ncomp output is only calculated for Mfold, nrepeat>3 and ncomp>2.
Having errors of 0 for all component is a bit strange, how many variables are kept on all the components (from tune())? How does the plotIndiv look?
-
- changed status to resolved
closed
- Log in to comment
Dear Carine,
If you only have 8 samples, it is probably better to do a leave-one-out cross validation, and also to keep the number of components small (2-3). Maybe something like: