Error in performance assessment
I'm running into an error in the perf() function. I built the following model:
mod=spls(Y=Y[-test,], X=X[-test,],
ncomp = 10,
mode = c("regression", "canonical", "invariant", "classic")[1],
keepX=which(colSums(X)>10),
keepY=which(colSums(abs(Y))>10),
scale = TRUE,
tol = 1e-6,max.iter = 1000,near.zero.var = FALSE,logratio="none",multilevel=NULL,all.outputs = TRUE)
and get the following error
> perf(mod,'Mfold')
|= | 1%
Error in if ((crossprod(a.cv - a.old.cv) < tol) || (iter.cv == max.iter)) break :
missing value where TRUE/FALSE needed
In addition: Warning messages:
1: The SGCCA algorithm did not converge
2: The SGCCA algorithm did not converge
3: In cor(A[[k]], variates.A[[k]]) : the standard deviation is zero
Summaries of my X and Y (binary) matrixes:
It seems like an NA is sneeking in here through a.cv or a.old.cv and the crossprod calculations used to generate them. Any idea if this can be resolved?
Comments (5)
-
-
reporter Thanks for the quick response!
Looks like it was my misuse of the 'keepX' variable as you said. What is the purpose of manually specifying the number of variables? Does SPLS have a tendency to overestimate the necessary variables?
-
reporter On a related note, I'm running the predict.spls function to recapitulate my binary Y matrix but the $predict output has 3 dimensions: observations x variable x ncomp. How do I decompose this back to a 2d matrix (observations x variables) to asses the quality of the prediction?
-
spls() chooses the best linear combination of keepX variables. If you want to optimise that number, you need to use the tune.splsda() function on a grid of keepX parameter.
Regarding predict, you should use the $predict[,,ncomp], as this is the prediction of all the 1:ncomp components. If you only look at $predict[,,1] then it's the prediction with the first component only.
-
- changed status to resolved
- Log in to comment
Hi Benjamin,
First I'd like to remind you that the keepX and keepY parameters are the number of variables you want to keep on each component (one number to be given for each of the 10 components). In your code you used
This gives you which columns to keep but not their number.
To answer your question: it seems your V8 in Y is a constant (and null) variable, you may want to set near.zero.var=TRUE, or remove this column. V9 and V10 don't look to informative either.. This might result in constant variable during the CV process, I think setting near=TRUE should solve this problem.
Let me know!