Tuning MINT fails

Issue #121 resolved
Nikolay Oskolkov created an issue

Dear Kim-Anh and Florian,

I am using MINT to integrate two scRNAseq data sets, different cells (overlapping cell types) but same features (genes). Every time I try to tune parameters I get the following errors:

1) mint.plsda.res.perf = mint.plsda(X = X, Y = Y, study = study, ncomp = 5, near.zero.var = TRUE) perf.mint.plsda.cell <- perf(mint.plsda.res.perf, validation = "Mfold", folds = 5, progressBar = TRUE, auc = FALSE, nrepeat = 1, cpus = 4) Error in if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed

2) tune.mint = tune(X = X, Y = Y, study = study, ncomp = 2, test.keepX = seq(1, 100, 1), method = 'mint.splsda', dist = "max.dist", progressBar = TRUE) Calling 'tune.mint.splsda' with Leave-One-Group-Out Cross Validation (nrepeat = 1)

comp 1 |================================================== | 50% Error in if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed

I am sure that there are no missing values in my data sets:

isTRUE(any(is.na(X))) [1] FALSE

isTRUE(any(is.na(Y))) [1] FALSE isTRUE(any(is.na(study))) [1] FALSE

I tried with different filtering options (select highly expressed genes) and used nearZeroVar function to filter away genes with low variance across cells. The error is still there. Would appreciate your help. Thanks!

Best wishes, Nikolay

Comments (6)

  1. Florian Rohart

    Hi Nikolay,

    Is that the same problem as your message on issue #101 ?

    If so, I'm not sure what's happening here.. Usually we get this error when NA are present, which can also happen after scaling even if no NA are in the data at the start. But we fixed that problem in 6.3.1, so maybe it is coming from somewhere else this time.

    Would you be ok sending me the data by email -only for debugging purposes?

    Thanks!

  2. Nikolay Oskolkov reporter

    Hi Florian,

    thanks a lot for your reply! Yes, it is the same issue, sorry for double-posting, was not sure the "resolved" issues can be re-activated.

    Did you try to run MINT (or any other mixOmics algorithm) on scRNAseq data sets? I am sure that those data sets do not have explicit NAs because their NAs all set to zero, so scRNAseq data has lots of "stochastic zeros", when we do not know is this a non-expressed gene or it is the poor capture efficiency or a transcription bursting event (a gene might be highly expressed but not expressed at the moment of scRNAseq experiment).

    I will send you a 1000X1000 subset of X matrix, Y and study variables with scrambled sample and gene IDs, is this your correct email f.rohart@imb.uq.edu.au ?

    Thanks! Nikolay

  3. Florian Rohart

    Yes it is! thanks

    I never tried on this type of data, but theoretically it should still work..

  4. Florian Rohart

    Hi Nikolay,

    Thanks for sending the data, it helped to pinpoint the problem very quickly. The problem was not coming from the type of data or from the package, but rather from the breakdown of your outcome per study. MINT cannot work when only one level of the outcome is in one study; in your case you have a study that only contains Fibroblasts and we cannot "estimate" the study effect or discriminate only the Fibroblast in that study.

    MINT works best when the contingency table does not contain any zero, and can work when there are a few zeros here and there (table(Y,study)).

    I've added stop/warning messages and that should be available in the next update to prevent further users from having the same problems. Sorry for the inconvenience

  5. Log in to comment