Error in predict.spls returned when running perf function

Issue #91 resolved
Seaver Wang created an issue

Hi all,

I’m encountering some issues with using the perf function. Attempting to perform either leave-one-out or 10-fold cross-validation results in the following error:

Error in predict.spls(spls.res, X.test[, nzv]) : 'newdata' must include all the variables of 'object$X'

This is the code that I am attempting to run:

pls_class_18S=pls(table18Sclass,predmat18S,ncomp=2,mode=c("regression"),near.zero.>var=TRUE) spls_class_18S=spls(table18Sclass,predmat18S,ncomp=2,keepX=c(10,10),mode=c("regress>ion"),near.zero.var=TRUE)

tune.pls_class_18S=perf(pls_class_18S,validation="loo",progressBar=FALSE,criterion = >'all') tune.spls_class_18S=perf(spls_class_18S,validation="loo",progressBar=FALSE,criterion >= 'all')

or alternatively:

tune.pls_class_18S=perf(pls_class_18S,validation="Mfold",folds=10,progressBar=FALSE>,criterion = 'all',nrepeat=50) tune.spls_class_18S=perf(spls_class_18S,validation="Mfold",folds=10,progressBar=FAL>SE,criterion = 'all',nrepeat=50)

I am using version 6.1.3. The dimensions of the data files that I am using are as follows:

dim(table18Sclass)

[1] 19 79

dim(predmat18S)

[1] 19 9

Interestingly, when attempting these same operations using my 16S rRNA amplicon sequencing data, I receive the same error for leave-one-out, but no error is returned for M-fold validation!

dim(table16Sclass)

[1] 23 75

dim(predmat16S)

[1] 23 9

I’d be happy to provide my data files for debugging if this would be helpful!

Comments (9)

  1. Kim-Anh Le Cao repo owner

    Hello, What I think is happening is that during the cross-validation process, your prediction (fold) data set contains many zeros, which have to be removed for the method to run. And so you may end up with a training set with, say, 18 samples (if loocv) x79 predictors, but your training set may end up being 70 predictors. What I would suggest you do first is prefilter your data, being removing the low counts. http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/

    It may not happen with your 16S as your original data may be less sparse (or, more variable).

    Let us know if that works out

    Kim-Anh

  2. Seaver Wang Account Deactivated reporter

    Hi Kim-Anh,

    Thanks for the suggestion! I went ahead and prefiltered my data using the function given in the provided link (percent=0.01). This reduced the size of my data to 30 16S and 47 18S OTUs from the previous 75 and 79, respectively. I then went ahead and mean-centered and scaled my data as before prior to retrying my above code.

    After implementing this prefiltering step, however, I receive the same error with identical behavior as before. The abovementioned error is returned for both forms of cross-validation for my 18S dataset. The error also occurs when using leave-one-out for my 16S data (no error when using M-fold with my 16S data).

  3. Kim-Anh Le Cao repo owner

    Thanks for the feedback Seaver, If you would not mind sharing your data with me and send it via private email (confidentiality guaranteed), I can have a deeper look at where the problem might be.

    Regards, Kim-Anh -- Please update my new email address: kimanh.lecao@unimelb.edu.au Dr. Kim-Anh Lê Cao Senior Lecturer, Statistical Genomics NHMRC Career Development Fellow

    School of Mathematics and Statistics Centre for Systems Genomics Bld 184 The University of Melbourne | VIC 3010 T: +61 (0)3834 43971

    mixOmics: http://mixomics.org/

  4. Kim-Anh Le Cao repo owner

    Hi Seaver, Florian seems to have resolved the bug. I attach an amended version. Could you install and check? (we have checked on the data you provided). Updated version

    Kim-Anh

  5. Seaver Wang Account Deactivated reporter

    I installed the updated package and checked, and it appears that everything works well now--I no longer get the error above!

  6. Kim-Anh Le Cao repo owner

    Many thanks Seaver for testing it out. I will mark the issue as resolved and will put a proper patch on our mixOmics website in the coming weeks.

  7. Jordi L

    Hi, I know this topic is quite old but I had the same issue and I wanted to stand for an easy solution. I applied LOOCV on my data several times with several transformations and when I applied a transformation by blocks I had that error. To solve it I used the transpose of a data frame of the newdata.

    for(i in 1:nrow(X.mr.bloque)){
    datos.entrenamiento<- X.mr.bloque[-i,]
    Y.entrenamiento <- Y.mr.bloque[-i]
    datos.validacion<- X.mr.bloque[i,]

    splsda.model.4 <- splsda(datos.entrenamiento, Y.entrenamiento, ncomp= choice.ncomp, keepX = choice.keepX, scale = FALSE)

    fib.splsda.conf.4[i] <-levels(Y)[as.numeric(predict(object = splsda.model.4, newdata = t(as.data.frame(datos.validacion)), method = "all")$class$centroids.dist[,1])+1]
    }

    I hope this helps to everyone.

    I’d like to thank all the mixOmics team for this fantastic package!

  8. Log in to comment