Error In DIABLO tune.block.splsda

Issue #124 resolved
tjgross created an issue

I have the following data that I am attempting to conduct a DIABLO analysis on:

Y<- final_frame$DiseaseState summary(Y) WT-Drug Transgenic-Drug WT Transgenic 26 43 50 33

lapply(data_by_level, dim) $Protein [1] 152 1114

$Metabolites [1] 152 145

$mRNA [1] 152 1100

When I attempt to tune the number of variables to include on each component, the analysis crashes with the following error when 2 CPUs are used:

tune.keepX<- tune.block.splsda(X = data_by_level, Y = Y, ncomp = 3, test.keepX = test.keepX, design = design, validation = 'Mfold', folds = 10, nrepeat = 1, cpus = 2, dist = "centroids.dist")

comp 1 | | 0% Error in checkForRemoteErrors(val) : 2 nodes produced errors; first error: missing value where TRUE/FALSE needed

When exempting the "cpus" argument, an error is still thrown:

tune.keepX<- tune.block.splsda(X = data_by_level, Y = Y, ncomp = 3, test.keepX = test.keepX, design = design, validation = 'Mfold', folds = 10, nrepeat = 1, dist = "centroids.dist")

comp 1 | | 0% Error in if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed

I'm unclear regarding what this means and would appreciate any relevant clarification/help.

Comments (5)

  1. Florian Rohart

    Hi there, Are you using the latest version of mixOmics (6.3.1)? If so, could you try with near.zero.var = TRUE, and with a test.keepX higher than 2 for each data type? (just trying to pinpoint the problem)

    In the previous version of mixOmics, this error

    Error in if (max(sapply(1:J, function(x) { : missing value where TRUE/FALSE needed
    

    was usually due to missing data or constant variables that are appearing during the CV-process of the tune function. But I thought we dealt with that problem.

    If you're happy to send the data over for debugging purposes only, I can fix things faster and make it work for you.

  2. tjgross reporter

    Thanks Florian, Re-running the analysis after updating mixOmics to 6.3.1, I am now getting the following error:

    ####Tune KeepX test.keepX<- list (Protein = seq(2,20,2), Metabolites = seq(2,20,2), RNA = seq(2,20,2)) set.seed(122) tune.keepX<- tune.block.splsda(X = data_by_level, Y = Y, ncomp = 3, test.keepX = test.keepX, design = design, validation = 'Mfold', folds = 10, nrepeat = 1, dist = "centroids.dist", measure="BER", cpus=2, near.zero.var=TRUE)

    You have provided a sequence of keepX of length: 10 for block Protein and 10 for block Metabolites and 10 for block RNA. This results in 1000 models being fitted for each component and each nrepeat, this may take some time to run, be patient! As code is running in parallel, the progressBar will only show 100% upon completion of each nrepeat/ component. Error in apply(is.na.A, 1, sum) : dim(X) must have a positive length

    I know for a fact that one of my data matrices (Metabolites) has NA values, but it was my understanding that these were generally permitted in mixOmics functions. I can look into sharing my data, but it would require speaking to my collaborator first.

  3. Florian Rohart

    NA are definitely permitted in mixOmics. but we don't have a lot od in house data with NA, so the testing is always taking more time for this type (and it seems adding random NA in the data doesn't help much..)

    could you give me your email address? I will send you the current version of mixOmics (still in development), but I may have corrected the problem already as the "apply(is.na.A, 1, sum) : dim(X) must have a positive length" error has been reported before..

  4. tjgross reporter

    Florian, I've just emailed you at you at your f.rohart@imb.uq.edu.au address. Let me know if there is another that you would prefer.

  5. Log in to comment