Error in predict function
Hi Mixomics Team,
Thanks for providing this great stats package.
I am running a splsda on a 16S dataset, with a training and test set (independent to training set of samples). The training set has run fine (not great performance) and now I would like to test on test set.
I am running into the following error:
Perio.test.splsda.predict <- predict(Perio.train.splsda, X.test, dist = "mahalanobis.dist")
Error in predict.mixo_spls(Perio.train.splsda, X.test, dist = "mahalanobis.dist") : 'newdata' must include all the variables of 'object$X' In addition: Warning message: In if (all.equal(colnames(newdata), colnames(object$X)) != TRUE) stop("'newdata' must include all the variables of 'object$X'") : the condition has length > 1 and only the first element will be used
I noticed from the previous issues that predict can use another package (pls) but I don't think this is attached in my R session.
Also does the training and test set need to have the same number of variables/OTUs? As this isn't the case.
I am using mixOmics 6.7.2, R version 3.6.0.
Any assistance would be greatly appreciated, thanks
Christina
Comments (3)
-
-
Issue
#165was marked as a duplicate of this issue. -
- changed status to resolved
Please consider submitting any possible further issues to GitHub.
- Log in to comment
Hi Cristina, sorry we’re getting back to you now as we had not realised we have not updated our issues page on Bioconductor (we’re now on GitHub and discussions are on Discourse).
Yes, you need matching variables on training and test set otherwise there is physically no way for the method to perform prediction for variables it has not seen before, or has used to fit a model and now cannot find in test. So I recommend you find the intersect of variables in the test-train datasets before prediction.
Also, please consider upgrading to the latest version via:
Additionally, you do not need
pls
package to usemixOmics
and thepls
namespace is a built-in function inmixOmics
.