bug in tuning function using DIABLO
Dear MixOmics Team,
I am currently working on a RNA-seq derived data set and on a microbiota data set but I am having problems in the tuning step of Diablo. I already read a similar issue reported but the problem there was that there were missing data.
lapply(my_data, dim)
$RNAseq
[1] 16 239
$MB
[1] 16 779
Y <- integrative_dataInfo$Sample_Group
summary(Y)
healthy patient
10 6
ncomp <- 2
design <- matrix(1, ncol = length(my_data), nrow = length(my_data),
dimnames = list(names(my_data), names(my_data)))
diag(design) <- 0
design
RNAseq MB
RNAseq 0.0 0.1
MB 0.1 0.0
test.keepX = list('RNAseq' = c(5:9, seq(10, 18, 2), seq(20,30,5)),
"MB" = c(5:9, seq(10, 18, 2), seq(20,30,5)))
test.keepX
$RNAseq
[1] 5 6 7 8 9 10 12 14 16 18 20 25 30
$MB
[1] 5 6 7 8 9 10 12 14 16 18 20 25 30
tune.TCGA = tune.block.splsda(X = my_data, Y = Y, test.keepX = test.keepX, ncomp = ncomp, design = design, dist = "max.dist")
You have provided a sequence of keepX of length: 13 for block RNAseq and 13 for block MB.
This results in 169 models being fitted for each component and each nrepeat, this may take some time to run, be patient!
You can look into the 'cpus' argument to speed up computation time.
comp 1
| | 0%
Error in if (max(sapply(1:J, function(x) { :
missing value where TRUE/FALSE needed
I double checked looking for missing values/Nas but the two matrix are both numeric. the only main characteristic is that the Microbiota related matrix is sparse. Does it affect the tuning procedure? If so, how can I overcome the problem?
Thank you very much in advance
Comments (8)
-
repo owner -
Hi Frederica,
Did that solve your problem? My input would be that some variables in your MB data have a null variance during the tuning process (when a training set of 9/10th of the data is created), this was indeed solved in recent version of mixOmics.
Let us know if you still encounter this problem!
Thanks
-
Hi mixOmics team, I have the same problem as Federica with tuning DIABLO. I have the last version of mixOmics 6.2.0. I made prefiltering + offset 1 + CSS normalization. Thank you very much for your help,
library(mixOmics) #######two blocks, X (microbiota) and Y (metabolites) X <- data.CSS_diablo class(X)#matrix Y <- ions_g class(Y)#matrix #define diablo input data = list(otus = X, ions = Y) lapply(data, dim) # $otus # [1] 91 920 # # $ions # [1] 91 1590 ### Z is constraining factor for Diablo, two species of ants Cam and Cre infos_g_remove$label_g = factor(infos_g_remove$label_g, levels=c("Cam", "Cre")) Z = infos_g_remove$label_g summary(Z) # Cam Cre # 46 45 ncomp = 2 #ncomp nb of axes keep=c(5:9, seq(10, 18, 2), seq(20,30,5)) #keep integer or vector of integers to define the tuning nreptune=10 #analysis design design = matrix(1, ncol = length(data), nrow = length(data), dimnames = list(names(data), names(data))) diag(design) = 0 design # otus ions # otus 0 1 # ions 1 0 #tuning setting test.keepX = list("otus" = keep, "ions" = keep) #tuning tune.TCGA = tune.block.splsda(X = data, Y = Z, ncomp = ncomp, test.keepX = test.keepX, design = design,validation ="Mfold", dist="max.dist", folds=10, nrepeat = nreptune) # You have provided a sequence of keepX of length: 13 for block otus and 13 for block ions. # This results in 169 models being fitted for each component and each nrepeat, this may take some time to run, be patient! # You can look into the 'cpus' argument to speed up computation time. # # comp 1 # | | 0% # Error in if (max(sapply(1:J, function(x) { : # missing value where TRUE/FALSE needed
-
reporter Dear Omics Team, thank you for your comments. I solved the problem by filtering out the bacterial species present in less than a specific number of samples (depending on the sample size). I am going to try by using the last version you published on all data and I will let you know if works. Thank you again for yourhelp!
-
Dear mixOmics team, thank you for this help. I managed to find the problem for me. Finally it's because I had NA value which were putted in my ion matrix during the step where I transform my dataframe to a numeric matrix...Thank you again for this help.
-
repo owner - changed status to resolved
Bugs resolved with the new 6.2 version, and removing NA from the data matrices.
-
Hi Kim-Anh and Florian,
I also have the same problem running MINT analysis on two scRNASeq data sets. I removed all genes with mean count across samples below 1 and I am pretty sure there are no missing values as I ran
isTRUE(any(is.na(X))) [1] FALSE isTRUE(any(is.na(Y))) [1] FALSE
I used mixOmics_6.3.1 and also the patch that Kim-Anh recommended here but still same error. Could you please advise? Thanks! Nikolay
-
And when I skip the tuning step and just run the plsda with 2 components I get
plotIndiv(mint.plsda.res, legend = TRUE, title = 'MINT PLS-DA', ellipse = TRUE) Error in FUN(Xi, ...) : object 'Col11' not found
Could you please comment? Thanks!
- Log in to comment
Hello Federica,
We may have resolved the bug in the patch version of the package which will be submitted to the CRAN this week. The patch can be downloaded and installed here: http://mixomics.org/wp-content/uploads/2017/06/mixOmics_6.1.3b.tar.gz
Could you also check that you follow the procedure for your microbiota data, as indicated in:
Prefiltering step: http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/ Normalisation on prefiltered data + offset 1: http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/tss-normalisation/
Could you let us know how you go to see how the issue can be resolved, thank you