My matrice of DEGs is singular

Create issue
Issue #109 resolved
GENET Carine created an issue

Dear all,

I encountered the same trouble as reported by Nikolay Oskolkov (issue #105) trough the plsda function and got the same error:

Error in solve.default(Sr) : system is computationally singular: reciprocal condition number = 1.17735e-18 In addition: Warning message: In internal_wrapper.mint(X = X, Y = Y.mat, ncomp = ncomp, scale = scale, : At least one study has less than 5 samples, mean centering might not do as expected

If I understood well, that means that my genes have highly correlated expression. X is 657 DEGs and Y is Y<-as.factor(c(rep("CTRL",4),rep("TESTO",4))). What should I do with my X ? I tried folds=4 or 3 or 2 with no success.... Thanks for your help Regards Carine perf.plsda <- perf(res.plsda, validation = "Mfold", folds = 3, + progressBar = TRUE, auc = TRUE, nrepeat = 2)

Comments (4)

  1. Florian Rohart

    Dear Carine,

    If you only have 8 samples, it is probably better to do a leave-one-out cross validation, and also to keep the number of components small (2-3). Maybe something like:

    res.plsda <- plsda(X,Y,ncomp=2)
    res.perf <- perf(res.plsda, validation = "loo", progressBar = TRUE) # no need for nrepeat as the folds are not random with loo
    
  2. GENET Carine reporter

    Bonjour Florian, Merci pour cette solution, cela marche. J'avais oublié cette méthode de validation. En poursuivant les analyses, mon perf.plsda$error.rate est égal à 0. Cela veut dire que je ne peux pas faire d'erreur ? Ensuite mon ncomp est égal à NULL donc pour poursuivre les analyses je le mets à 2. Est ce valable ? Encore merci pour ton aide, Ci dessous mon code. head(perf.plsda$error.rate) $overall max.dist centroids.dist mahalanobis.dist comp 1 0 0 0 comp 2 0 0 0 comp 3 0 0 0

    $BER max.dist centroids.dist mahalanobis.dist comp 1 0 0 0 comp 2 0 0 0 comp 3 0 0 0

    plot(perf.plsda, overlay='measure', sd=TRUE) plot(perf.plsda, col = color.mixo(5:7), sd = TRUE, legend.position = "horizontal") list.keepX <- c(1:10, seq(20, 50, 10)) tune.splsda <- tune.splsda(X, Z, ncomp = 3, validation = 'Mfold', folds = 4, + progressBar = TRUE, dist = 'max.dist', measure = "BER", + test.keepX = list.keepX, nrepeat = 10, cpus = 2) As code is running in parallel, the progressBar will only show 100% upon completion of each component.

    comp 1 |=============================================================================| 100% comp 2 |=============================================================================| 100% comp 3 |=============================================================================| 100%

    error <- tune.splsda$error.rate ncomp <- tune.splsda$choice.ncomp$ncomp # optimal number of components based on t-tests ncomp # donne le nombre optimum ici NULL

  3. Florian Rohart

    Dear Carine,

    At the moment, the tune.splsda$choice.ncomp$ncomp output is only calculated for Mfold, nrepeat>3 and ncomp>2.

    Having errors of 0 for all component is a bit strange, how many variables are kept on all the components (from tune())? How does the plotIndiv look?

  4. Log in to comment