DIABLO perf multiple issues: Error in max(temp[, 2]) : invalid 'type' (list) of argument

Issue #122 resolved
Jen Modliszewski created an issue

Hi,

I am trying to run DIABLO on a RNA-Seq and methyl-array dataset. I'm running into a couple of issues when running perf. I am wondering if anyone has any insight into what is going on?

  1. When running perf, I get this error: Error in max(temp[, 2]) : invalid 'type' (list) of argument

  2. And also this multiple times, although it looks like it is possibly due to low/no variability in variables based on some previous issues posted here? 1: In cor(Ak, variates.Ak) : the standard deviation is zero

  3. Finally, I sometimes also see this, which also happens when using rCCA The SGCCA algorithm did not converge

Some details on the data sets/design:

lapply(data, dim)

$rna

[1] 23 15846

$methyl

[1] 23 242714

summary(Y)

Aged_Lean Aged_Obese Young_Lean Young_Obese

6 5 5 7

rna methyl

rna 0.0 0.1

methyl 0.1 0.0

diablo.integrated.sgccda.res = block.splsda(X = data, Y = Y, ncomp = 5, design = design)

Thank you for any help/comments. Jen

Comments (9)

  1. Florian Rohart

    Hi Jen,

    1. Could you run a traceback() after obtaining the bug in perf?
    2. This warnings comes from low/no variability in parameters as you mentioned. You can try to remove them by adding near.zero.var=TRUE when calling diablo/block.splsda
    3. sometimes the algorithm does not converge with the default number of iterations (100). You could try to increase this number (max.iter); this depends on your data...

    Thanks!

  2. Jen Modliszewski reporter

    Hi Florian,

    Thanks so much for your response!

    Unfortunately, it looks like setting the near.zero.var = TRUE and increasing the max.iter(to 10000) did not help in my case.

    The output from traceback() is below.

    13: which(temp[, 2] == max(temp[, 2]))
    12: FUN(newX[, i], ...)
    11: apply(x, c(1, 2), function(z) {
            temp = aggregate(object$weights, list(z), sum)
            ind = which(temp[, 2] == max(temp[, 2]))
            if (length(ind) == 1) {
                res = temp[ind, 1]
            }
            else {
                res = NA
            }
            res
        })
    10: FUN(X[[i]], ...)
    9: lapply(temp.all, function(x) {
           apply(x, c(1, 2), function(z) {
               temp = aggregate(object$weights, list(z), sum)
               ind = which(temp[, 2] == max(temp[, 2]))
               if (length(ind) == 1) {
                   res = temp[ind, 1]
               }
               else {
                   res = NA
               }
               res
           })
       })
    8: unlist(lapply(temp.all, function(x) {
           apply(x, c(1, 2), function(z) {
               temp = aggregate(object$weights, list(z), sum)
               ind = which(temp[, 2] == max(temp[, 2]))
               if (length(ind) == 1) {
                   res = temp[ind, 1]
               }
               else {
                   res = NA
               }
               res
           })
       }))
    7: array(unlist(lapply(temp.all, function(x) {
           apply(x, c(1, 2), function(z) {
               temp = aggregate(object$weights, list(z), sum)
               ind = which(temp[, 2] == max(temp[, 2]))
               if (length(ind) == 1) {
                   res = temp[ind, 1]
               }
               else {
                   res = NA
               }
               res
           })
       })), dim(Y.hat[[1]]), dimnames = list(rownames(newdata[[1]]), 
           colnames(Y), paste("dim", c(1:min(ncomp[-object$indY])), 
               sep = " ")))
    6: predict.block.spls(model[[x]], X.test[[x]], dist = "all")
    5: predict(model[[x]], X.test[[x]], dist = "all")
    4: FUN(X[[i]], ...)
    3: lapply(1:M, function(x) {
           predict(model[[x]], X.test[[x]], dist = "all")
       })
    2: perf.sgccda(diablo.integrated.sgccda.res3, validation = "Mfold", 
           folds = 3, nrepeat = 10, progressBar = TRUE)
    1: perf(diablo.integrated.sgccda.res3, validation = "Mfold", folds = 3, 
           nrepeat = 10, progressBar = TRUE)
    

    Thanks again for your help! Jen

  3. Florian Rohart

    Hi Jen,

    Sorry for the delay. I can't replicate this problem, so it's probably something specific with your data. Would you send me (a part) of your data (debugging purposes only) so it's easier for me to debug and I can send you a fix asap. f.rohart at uq.edu.au

    thanks!

  4. Christina Adler

    Hi Florian,

    Apologies for highjacking this thread, but I was wondering if the issue was resolved?

    I am running into the same problem, same error with perf.diablo. Tried the increase in max.iterations and near.zero.var=TRUE with no improvement.

    Looking for assistance to see if it is just my data/low variability (it is a pilot dataset of small number, n=17, for 16S and ITS) or if it can be resolved.

    Thanks in advance for any assistance

    Christina

  5. Florian Rohart

    Hi Christina,

    issue wasn't fixed as it wasn't identified/replicated on my end.. Can't fix something without knowing where the problem lies :) I'd be very grateful if you could send me your data - will only be for debugging purposes only

    Thanks!

  6. Christina Adler

    Hi Florian,

    Thanks heaps for the reply! It may just be my dodgy data!

    I will send it through to you, thanks again for any assistance, your time is greatly appreciated!

    Christina

  7. Jen Modliszewski reporter

    Hi there - sorry for abandoning this thread! I just wanted to add that I do think it was one of the data sets not having enough variability despite my attempts to filter for that. I have since done this same analysis with several other data sets with no issue. The dataset causing the issue was a bisulfite-methyl seq data set. I never got to the bottom of the issue, since I was using the data set to learn the tool.

  8. Log in to comment