bug in tuning function using DIABLO

Issue #101 resolved

Federica created an issue 2017-07-21

Dear MixOmics Team,

I am currently working on a RNA-seq derived data set and on a microbiota data set but I am having problems in the tuning step of Diablo. I already read a similar issue reported but the problem there was that there were missing data.

lapply(my_data, dim)
$RNAseq
[1]  16 239

$MB
[1]  16 779

Y        <- integrative_dataInfo$Sample_Group
 summary(Y)
healthy patient 
     10       6 

ncomp <- 2
design <- matrix(1, ncol = length(my_data), nrow = length(my_data),
                 dimnames = list(names(my_data), names(my_data)))

diag(design) <- 0
design
       RNAseq  MB
RNAseq    0.0 0.1
MB        0.1 0.0

test.keepX = list('RNAseq' = c(5:9, seq(10, 18, 2), seq(20,30,5)),
                   "MB" = c(5:9, seq(10, 18, 2), seq(20,30,5)))
test.keepX
$RNAseq
 [1]  5  6  7  8  9 10 12 14 16 18 20 25 30

$MB
 [1]  5  6  7  8  9 10 12 14 16 18 20 25 30

tune.TCGA = tune.block.splsda(X = my_data, Y = Y, test.keepX = test.keepX, ncomp = ncomp, design = design, dist = "max.dist")
You have provided a sequence of keepX of length: 13 for block RNAseq and 13 for block MB.
This results in 169 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
You can look into the 'cpus' argument to speed up computation time.

comp 1 
  |                                                                                                                                     |   0%
Error in if (max(sapply(1:J, function(x) { : 
  missing value where TRUE/FALSE needed

I double checked looking for missing values/Nas but the two matrix are both numeric. the only main characteristic is that the Microbiota related matrix is sparse. Does it affect the tuning procedure? If so, how can I overcome the problem?

Thank you very much in advance

Comments (8)

Kim-Anh Le Cao repo owner
Hello Federica,

We may have resolved the bug in the patch version of the package which will be submitted to the CRAN this week. The patch can be downloaded and installed here: http://mixomics.org/wp-content/uploads/2017/06/mixOmics_6.1.3b.tar.gz

Could you also check that you follow the procedure for your microbiota data, as indicated in:

Prefiltering step: http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/ Normalisation on prefiltered data + offset 1: http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/tss-normalisation/

Could you let us know how you go to see how the issue can be resolved, thank you
- 2017-08-01T07:00:11+00:00
Florian Rohart
Hi Frederica,

Did that solve your problem? My input would be that some variables in your MB data have a null variance during the tuning process (when a training set of 9/10th of the data is created), this was indeed solved in recent version of mixOmics.

Let us know if you still encounter this problem!

Thanks
- 2017-09-05T10:36:19+00:00

CarolineBirer

Hi mixOmics team, I have the same problem as Federica with tuning DIABLO. I have the last version of mixOmics 6.2.0. I made prefiltering + offset 1 + CSS normalization. Thank you very much for your help,

library(mixOmics)
#######two blocks, X (microbiota) and Y (metabolites)
X <- data.CSS_diablo
class(X)#matrix
Y <- ions_g
class(Y)#matrix
#define diablo input
data = list(otus = X, ions = Y)
lapply(data, dim)
# $otus
# [1]  91 920
# 
# $ions
# [1]   91 1590

### Z is constraining factor for Diablo, two species of ants Cam and Cre
infos_g_remove$label_g = factor(infos_g_remove$label_g, levels=c("Cam", "Cre"))
Z = infos_g_remove$label_g
summary(Z)
# Cam Cre 
# 46  45 

ncomp = 2 #ncomp nb of axes
keep=c(5:9, seq(10, 18, 2), seq(20,30,5)) #keep integer or vector of integers to define the tuning
nreptune=10  

#analysis design
design = matrix(1, ncol = length(data), nrow = length(data), 
                    dimnames = list(names(data), names(data)))
diag(design) = 0
design   
#       otus ions
# otus    0    1
# ions    1    0

#tuning setting
test.keepX = list("otus" = keep,
                  "ions" = keep)

#tuning
tune.TCGA = tune.block.splsda(X = data, Y = Z, ncomp = ncomp,
                              test.keepX = test.keepX, design = design,validation ="Mfold", 
                              dist="max.dist", folds=10,
                              nrepeat = nreptune)



# You have provided a sequence of keepX of length: 13 for block otus and 13 for block ions.
# This results in 169 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
#   You can look into the 'cpus' argument to speed up computation time.
# 
# comp 1 
# |                                                                                                                   |   0%
# Error in if (max(sapply(1:J, function(x) { : 
#     missing value where TRUE/FALSE needed

2017-09-08T18:51:38+00:00

Federica reporter
Dear Omics Team, thank you for your comments. I solved the problem by filtering out the bacterial species present in less than a specific number of samples (depending on the sample size). I am going to try by using the last version you published on all data and I will let you know if works. Thank you again for yourhelp!
- 2017-09-11T07:52:28+00:00
CarolineBirer
Dear mixOmics team, thank you for this help. I managed to find the problem for me. Finally it's because I had NA value which were putted in my ion matrix during the step where I transform my dataframe to a numeric matrix...Thank you again for this help.
- 2017-09-15T20:08:21+00:00
Kim-Anh Le Cao repo owner
- changed status to resolved
Bugs resolved with the new 6.2 version, and removing NA from the data matrices.
- 2017-09-16T23:13:08+00:00
Nikolay Oskolkov
Hi Kim-Anh and Florian,

I also have the same problem running MINT analysis on two scRNASeq data sets. I removed all genes with mean count across samples below 1 and I am pretty sure there are no missing values as I ran

isTRUE(any(is.na(X))) [1] FALSE isTRUE(any(is.na(Y))) [1] FALSE

I used mixOmics_6.3.1 and also the patch that Kim-Anh recommended here but still same error. Could you please advise? Thanks! Nikolay
- 2018-01-17T13:04:33+00:00
Nikolay Oskolkov
And when I skip the tuning step and just run the plsda with 2 components I get

plotIndiv(mint.plsda.res, legend = TRUE, title = 'MINT PLS-DA', ellipse = TRUE) Error in FUN(Xi, ...) : object 'Col11' not found

Could you please comment? Thanks!
- 2018-01-17T13:44:59+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Votes: 0

Watchers: 1