Error in predict.spls returned when running perf function

Create issue
Issue #91 resolved
Seaver Wang created an issue

Hi all,

I’m encountering some issues with using the perf function. Attempting to perform either leave-one-out or 10-fold cross-validation results in the following error:

Error in predict.spls(spls.res, X.test[, nzv]) : 'newdata' must include all the variables of 'object$X'

This is the code that I am attempting to run:

pls_class_18S=pls(table18Sclass,predmat18S,ncomp=2,mode=c("regression"),>var=TRUE) spls_class_18S=spls(table18Sclass,predmat18S,ncomp=2,keepX=c(10,10),mode=c("regress>ion"),

tune.pls_class_18S=perf(pls_class_18S,validation="loo",progressBar=FALSE,criterion = >'all') tune.spls_class_18S=perf(spls_class_18S,validation="loo",progressBar=FALSE,criterion >= 'all')

or alternatively:

tune.pls_class_18S=perf(pls_class_18S,validation="Mfold",folds=10,progressBar=FALSE>,criterion = 'all',nrepeat=50) tune.spls_class_18S=perf(spls_class_18S,validation="Mfold",folds=10,progressBar=FAL>SE,criterion = 'all',nrepeat=50)

I am using version 6.1.3. The dimensions of the data files that I am using are as follows:


[1] 19 79


[1] 19 9

Interestingly, when attempting these same operations using my 16S rRNA amplicon sequencing data, I receive the same error for leave-one-out, but no error is returned for M-fold validation!


[1] 23 75


[1] 23 9

I’d be happy to provide my data files for debugging if this would be helpful!

Comments (9)

  1. Kim-Anh Le Cao repo owner

    Hello, What I think is happening is that during the cross-validation process, your prediction (fold) data set contains many zeros, which have to be removed for the method to run. And so you may end up with a training set with, say, 18 samples (if loocv) x79 predictors, but your training set may end up being 70 predictors. What I would suggest you do first is prefilter your data, being removing the low counts.

    It may not happen with your 16S as your original data may be less sparse (or, more variable).

    Let us know if that works out


  2. Seaver Wang reporter

    Hi Kim-Anh,

    Thanks for the suggestion! I went ahead and prefiltered my data using the function given in the provided link (percent=0.01). This reduced the size of my data to 30 16S and 47 18S OTUs from the previous 75 and 79, respectively. I then went ahead and mean-centered and scaled my data as before prior to retrying my above code.

    After implementing this prefiltering step, however, I receive the same error with identical behavior as before. The abovementioned error is returned for both forms of cross-validation for my 18S dataset. The error also occurs when using leave-one-out for my 16S data (no error when using M-fold with my 16S data).

  3. Kim-Anh Le Cao repo owner

    Thanks for the feedback Seaver, If you would not mind sharing your data with me and send it via private email (confidentiality guaranteed), I can have a deeper look at where the problem might be.

    Regards, Kim-Anh -- Please update my new email address: Dr. Kim-Anh Lê Cao Senior Lecturer, Statistical Genomics NHMRC Career Development Fellow

    School of Mathematics and Statistics Centre for Systems Genomics Bld 184 The University of Melbourne | VIC 3010 T: +61 (0)3834 43971


  4. Kim-Anh Le Cao repo owner

    Hi Seaver, Florian seems to have resolved the bug. I attach an amended version. Could you install and check? (we have checked on the data you provided). Updated version


  5. Seaver Wang reporter

    I installed the updated package and checked, and it appears that everything works well now--I no longer get the error above!

  6. Kim-Anh Le Cao repo owner

    Many thanks Seaver for testing it out. I will mark the issue as resolved and will put a proper patch on our mixOmics website in the coming weeks.

  7. Jordi L

    Hi, I know this topic is quite old but I had the same issue and I wanted to stand for an easy solution. I applied LOOCV on my data several times with several transformations and when I applied a transformation by blocks I had that error. To solve it I used the transpose of a data frame of the newdata.

    for(i in 1:nrow({
    Y.entrenamiento <-[-i]

    splsda.model.4 <- splsda(datos.entrenamiento, Y.entrenamiento, ncomp= choice.ncomp, keepX = choice.keepX, scale = FALSE)

    fib.splsda.conf.4[i] <-levels(Y)[as.numeric(predict(object = splsda.model.4, newdata = t(, method = "all")$class$centroids.dist[,1])+1]

    I hope this helps to everyone.

    I’d like to thank all the mixOmics team for this fantastic package!

  8. Log in to comment