Error in predict.spls returned when running perf function
Hi all,
I’m encountering some issues with using the perf function. Attempting to perform either leave-one-out or 10-fold cross-validation results in the following error:
Error in predict.spls(spls.res, X.test[, nzv]) : 'newdata' must include all the variables of 'object$X'
This is the code that I am attempting to run:
pls_class_18S=pls(table18Sclass,predmat18S,ncomp=2,mode=c("regression"),near.zero.>var=TRUE) spls_class_18S=spls(table18Sclass,predmat18S,ncomp=2,keepX=c(10,10),mode=c("regress>ion"),near.zero.var=TRUE)
tune.pls_class_18S=perf(pls_class_18S,validation="loo",progressBar=FALSE,criterion = >'all') tune.spls_class_18S=perf(spls_class_18S,validation="loo",progressBar=FALSE,criterion >= 'all')
or alternatively:
tune.pls_class_18S=perf(pls_class_18S,validation="Mfold",folds=10,progressBar=FALSE>,criterion = 'all',nrepeat=50) tune.spls_class_18S=perf(spls_class_18S,validation="Mfold",folds=10,progressBar=FAL>SE,criterion = 'all',nrepeat=50)
I am using version 6.1.3. The dimensions of the data files that I am using are as follows:
dim(table18Sclass)
[1] 19 79
dim(predmat18S)
[1] 19 9
Interestingly, when attempting these same operations using my 16S rRNA amplicon sequencing data, I receive the same error for leave-one-out, but no error is returned for M-fold validation!
dim(table16Sclass)
[1] 23 75
dim(predmat16S)
[1] 23 9
I’d be happy to provide my data files for debugging if this would be helpful!
Comments (9)
-
repo owner -
Account Deactivated reporter Hi Kim-Anh,
Thanks for the suggestion! I went ahead and prefiltered my data using the function given in the provided link (percent=0.01). This reduced the size of my data to 30 16S and 47 18S OTUs from the previous 75 and 79, respectively. I then went ahead and mean-centered and scaled my data as before prior to retrying my above code.
After implementing this prefiltering step, however, I receive the same error with identical behavior as before. The abovementioned error is returned for both forms of cross-validation for my 18S dataset. The error also occurs when using leave-one-out for my 16S data (no error when using M-fold with my 16S data).
-
repo owner Thanks for the feedback Seaver, If you would not mind sharing your data with me and send it via private email (confidentiality guaranteed), I can have a deeper look at where the problem might be.
Regards, Kim-Anh -- Please update my new email address: kimanh.lecao@unimelb.edu.au Dr. Kim-Anh Lê Cao Senior Lecturer, Statistical Genomics NHMRC Career Development Fellow
School of Mathematics and Statistics Centre for Systems Genomics Bld 184 The University of Melbourne | VIC 3010 T: +61 (0)3834 43971
mixOmics: http://mixomics.org/
-
repo owner Hi Seaver, Florian seems to have resolved the bug. I attach an amended version. Could you install and check? (we have checked on the data you provided). Updated version
Kim-Anh
-
Account Deactivated reporter I installed the updated package and checked, and it appears that everything works well now--I no longer get the error above!
-
repo owner Many thanks Seaver for testing it out. I will mark the issue as resolved and will put a proper patch on our mixOmics website in the coming weeks.
-
repo owner - changed status to resolved
Predict function fixed. Small patch of the package given at this link We will put an update post on our website in the coming weeks.
-
Great !! Had the same error message, downloaded the tar patch and now everything works
-
Hi, I know this topic is quite old but I had the same issue and I wanted to stand for an easy solution. I applied LOOCV on my data several times with several transformations and when I applied a transformation by blocks I had that error. To solve it I used the transpose of a data frame of the newdata.
for(i in 1:nrow(X.mr.bloque)){
datos.entrenamiento<- X.mr.bloque[-i,]
Y.entrenamiento <- Y.mr.bloque[-i]
datos.validacion<- X.mr.bloque[i,]splsda.model.4 <- splsda(datos.entrenamiento, Y.entrenamiento, ncomp= choice.ncomp, keepX = choice.keepX, scale = FALSE)
fib.splsda.conf.4[i] <-levels(Y)[as.numeric(predict(object = splsda.model.4, newdata = t(as.data.frame(datos.validacion)), method = "all")$class$centroids.dist[,1])+1]
}
I hope this helps to everyone.
I’d like to thank all the mixOmics team for this fantastic package!
- Log in to comment
Hello, What I think is happening is that during the cross-validation process, your prediction (fold) data set contains many zeros, which have to be removed for the method to run. And so you may end up with a training set with, say, 18 samples (if loocv) x79 predictors, but your training set may end up being 70 predictors. What I would suggest you do first is prefilter your data, being removing the low counts. http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/
It may not happen with your 16S as your original data may be less sparse (or, more variable).
Let us know if that works out
Kim-Anh