bug in tuning function using DIABLO

Issue #101 resolved
Federica created an issue

Dear MixOmics Team,

I am currently working on a RNA-seq derived data set and on a microbiota data set but I am having problems in the tuning step of Diablo. I already read a similar issue reported but the problem there was that there were missing data.

lapply(my_data, dim)
$RNAseq
[1]  16 239

$MB
[1]  16 779

Y        <- integrative_dataInfo$Sample_Group
 summary(Y)
healthy patient 
     10       6 

ncomp <- 2
design <- matrix(1, ncol = length(my_data), nrow = length(my_data),
                 dimnames = list(names(my_data), names(my_data)))

diag(design) <- 0
design
       RNAseq  MB
RNAseq    0.0 0.1
MB        0.1 0.0

test.keepX = list('RNAseq' = c(5:9, seq(10, 18, 2), seq(20,30,5)),
                   "MB" = c(5:9, seq(10, 18, 2), seq(20,30,5)))
test.keepX
$RNAseq
 [1]  5  6  7  8  9 10 12 14 16 18 20 25 30

$MB
 [1]  5  6  7  8  9 10 12 14 16 18 20 25 30

tune.TCGA = tune.block.splsda(X = my_data, Y = Y, test.keepX = test.keepX, ncomp = ncomp, design = design, dist = "max.dist")
You have provided a sequence of keepX of length: 13 for block RNAseq and 13 for block MB.
This results in 169 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
You can look into the 'cpus' argument to speed up computation time.

comp 1 
  |                                                                                                                                     |   0%
Error in if (max(sapply(1:J, function(x) { : 
  missing value where TRUE/FALSE needed

I double checked looking for missing values/Nas but the two matrix are both numeric. the only main characteristic is that the Microbiota related matrix is sparse. Does it affect the tuning procedure? If so, how can I overcome the problem?

Thank you very much in advance

Comments (8)

  1. Kim-Anh Le Cao repo owner

    Hello Federica,

    We may have resolved the bug in the patch version of the package which will be submitted to the CRAN this week. The patch can be downloaded and installed here: http://mixomics.org/wp-content/uploads/2017/06/mixOmics_6.1.3b.tar.gz

    Could you also check that you follow the procedure for your microbiota data, as indicated in:

    Prefiltering step: http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/ Normalisation on prefiltered data + offset 1: http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/tss-normalisation/

    Could you let us know how you go to see how the issue can be resolved, thank you

  2. Florian Rohart

    Hi Frederica,

    Did that solve your problem? My input would be that some variables in your MB data have a null variance during the tuning process (when a training set of 9/10th of the data is created), this was indeed solved in recent version of mixOmics.

    Let us know if you still encounter this problem!

    Thanks

  3. CarolineBirer

    Hi mixOmics team, I have the same problem as Federica with tuning DIABLO. I have the last version of mixOmics 6.2.0. I made prefiltering + offset 1 + CSS normalization. Thank you very much for your help,

    library(mixOmics)
    #######two blocks, X (microbiota) and Y (metabolites)
    X <- data.CSS_diablo
    class(X)#matrix
    Y <- ions_g
    class(Y)#matrix
    #define diablo input
    data = list(otus = X, ions = Y)
    lapply(data, dim)
    # $otus
    # [1]  91 920
    # 
    # $ions
    # [1]   91 1590
    
    ### Z is constraining factor for Diablo, two species of ants Cam and Cre
    infos_g_remove$label_g = factor(infos_g_remove$label_g, levels=c("Cam", "Cre"))
    Z = infos_g_remove$label_g
    summary(Z)
    # Cam Cre 
    # 46  45 
    
    ncomp = 2 #ncomp nb of axes
    keep=c(5:9, seq(10, 18, 2), seq(20,30,5)) #keep integer or vector of integers to define the tuning
    nreptune=10  
    
    #analysis design
    design = matrix(1, ncol = length(data), nrow = length(data), 
                        dimnames = list(names(data), names(data)))
    diag(design) = 0
    design   
    #       otus ions
    # otus    0    1
    # ions    1    0
    
    #tuning setting
    test.keepX = list("otus" = keep,
                      "ions" = keep)
    
    #tuning
    tune.TCGA = tune.block.splsda(X = data, Y = Z, ncomp = ncomp,
                                  test.keepX = test.keepX, design = design,validation ="Mfold", 
                                  dist="max.dist", folds=10,
                                  nrepeat = nreptune)
    
    
    
    # You have provided a sequence of keepX of length: 13 for block otus and 13 for block ions.
    # This results in 169 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
    #   You can look into the 'cpus' argument to speed up computation time.
    # 
    # comp 1 
    # |                                                                                                                   |   0%
    # Error in if (max(sapply(1:J, function(x) { : 
    #     missing value where TRUE/FALSE needed
    
  4. Federica reporter

    Dear Omics Team, thank you for your comments. I solved the problem by filtering out the bacterial species present in less than a specific number of samples (depending on the sample size). I am going to try by using the last version you published on all data and I will let you know if works. Thank you again for yourhelp!

  5. CarolineBirer

    Dear mixOmics team, thank you for this help. I managed to find the problem for me. Finally it's because I had NA value which were putted in my ion matrix during the step where I transform my dataframe to a numeric matrix...Thank you again for this help.

  6. Nikolay Oskolkov

    Hi Kim-Anh and Florian,

    I also have the same problem running MINT analysis on two scRNASeq data sets. I removed all genes with mean count across samples below 1 and I am pretty sure there are no missing values as I ran

    isTRUE(any(is.na(X))) [1] FALSE isTRUE(any(is.na(Y))) [1] FALSE

    I used mixOmics_6.3.1 and also the patch that Kim-Anh recommended here but still same error. Could you please advise? Thanks! Nikolay

  7. Nikolay Oskolkov

    And when I skip the tuning step and just run the plsda with 2 components I get

    plotIndiv(mint.plsda.res, legend = TRUE, title = 'MINT PLS-DA', ellipse = TRUE) Error in FUN(Xi, ...) : object 'Col11' not found

    Could you please comment? Thanks!

  8. Log in to comment