Error in predict.spls returned when running perf function

Issue #91 resolved

Seaver Wang created an issue 2017-05-17

Hi all,

I’m encountering some issues with using the perf function. Attempting to perform either leave-one-out or 10-fold cross-validation results in the following error:

Error in predict.spls(spls.res, X.test[, nzv]) : 'newdata' must include all the variables of 'object$X'

This is the code that I am attempting to run:

pls_class_18S=pls(table18Sclass,predmat18S,ncomp=2,mode=c("regression"),near.zero.>var=TRUE) spls_class_18S=spls(table18Sclass,predmat18S,ncomp=2,keepX=c(10,10),mode=c("regress>ion"),near.zero.var=TRUE)

tune.pls_class_18S=perf(pls_class_18S,validation="loo",progressBar=FALSE,criterion = >'all') tune.spls_class_18S=perf(spls_class_18S,validation="loo",progressBar=FALSE,criterion >= 'all')

or alternatively:

tune.pls_class_18S=perf(pls_class_18S,validation="Mfold",folds=10,progressBar=FALSE>,criterion = 'all',nrepeat=50) tune.spls_class_18S=perf(spls_class_18S,validation="Mfold",folds=10,progressBar=FAL>SE,criterion = 'all',nrepeat=50)

I am using version 6.1.3. The dimensions of the data files that I am using are as follows:

dim(table18Sclass)

[1] 19 79

dim(predmat18S)

[1] 19 9

Interestingly, when attempting these same operations using my 16S rRNA amplicon sequencing data, I receive the same error for leave-one-out, but no error is returned for M-fold validation!

dim(table16Sclass)

[1] 23 75

dim(predmat16S)

[1] 23 9

I’d be happy to provide my data files for debugging if this would be helpful!

Comments (9)

Kim-Anh Le Cao repo owner
Hello, What I think is happening is that during the cross-validation process, your prediction (fold) data set contains many zeros, which have to be removed for the method to run. And so you may end up with a training set with, say, 18 samples (if loocv) x79 predictors, but your training set may end up being 70 predictors. What I would suggest you do first is prefilter your data, being removing the low counts. http://mixomics.org/mixmc-mixomics-for-16s-microbial-communities/pre_filtering-normalisation/

It may not happen with your 16S as your original data may be less sparse (or, more variable).

Let us know if that works out

Kim-Anh
- 2017-05-17T22:06:33+00:00
Seaver Wang Account Deactivated reporter
Hi Kim-Anh,

Thanks for the suggestion! I went ahead and prefiltered my data using the function given in the provided link (percent=0.01). This reduced the size of my data to 30 16S and 47 18S OTUs from the previous 75 and 79, respectively. I then went ahead and mean-centered and scaled my data as before prior to retrying my above code.

After implementing this prefiltering step, however, I receive the same error with identical behavior as before. The abovementioned error is returned for both forms of cross-validation for my 18S dataset. The error also occurs when using leave-one-out for my 16S data (no error when using M-fold with my 16S data).
- 2017-05-18T13:03:36+00:00
Kim-Anh Le Cao repo owner
Thanks for the feedback Seaver, If you would not mind sharing your data with me and send it via private email (confidentiality guaranteed), I can have a deeper look at where the problem might be.

Regards, Kim-Anh -- Please update my new email address: kimanh.lecao@unimelb.edu.au Dr. Kim-Anh Lê Cao Senior Lecturer, Statistical Genomics NHMRC Career Development Fellow

School of Mathematics and Statistics Centre for Systems Genomics Bld 184 The University of Melbourne | VIC 3010 T: +61 (0)3834 43971

mixOmics: http://mixomics.org/
- 2017-05-20T02:30:06+00:00
Kim-Anh Le Cao repo owner
Hi Seaver, Florian seems to have resolved the bug. I attach an amended version. Could you install and check? (we have checked on the data you provided). Updated version

Kim-Anh
- 2017-06-13T06:52:58+00:00
Seaver Wang Account Deactivated reporter
I installed the updated package and checked, and it appears that everything works well now--I no longer get the error above!
- 2017-06-13T18:36:53+00:00
Kim-Anh Le Cao repo owner
Many thanks Seaver for testing it out. I will mark the issue as resolved and will put a proper patch on our mixOmics website in the coming weeks.
- 2017-06-13T23:49:58+00:00
Kim-Anh Le Cao repo owner
- changed status to resolved
Predict function fixed. Small patch of the package given at this link We will put an update post on our website in the coming weeks.
- 2017-06-13T23:53:07+00:00
GENET Carine
Great !! Had the same error message, downloaded the tar patch and now everything works
- 2017-09-22T08:12:46+00:00
Jordi L
Hi, I know this topic is quite old but I had the same issue and I wanted to stand for an easy solution. I applied LOOCV on my data several times with several transformations and when I applied a transformation by blocks I had that error. To solve it I used the transpose of a data frame of the newdata.

for(i in 1:nrow(X.mr.bloque)){
datos.entrenamiento<- X.mr.bloque[-i,]
Y.entrenamiento <- Y.mr.bloque[-i]
datos.validacion<- X.mr.bloque[i,]

splsda.model.4 <- splsda(datos.entrenamiento, Y.entrenamiento, ncomp= choice.ncomp, keepX = choice.keepX, scale = FALSE)

fib.splsda.conf.4[i] <-levels(Y)[as.numeric(predict(object = splsda.model.4, newdata = t(as.data.frame(datos.validacion)), method = "all")$class$centroids.dist[,1])+1]
}

‌

I hope this helps to everyone.

I’d like to thank all the mixOmics team for this fantastic package!
- 2020-08-17T11:44:27+00:00
Log in to comment

Assignee: –

Type: bug

Priority: major

Status: resolved

Votes: 1

Watchers: 1