Suggestions for Mfold, relevance network, explained variance, and general help.

Hello, I got few small suggestions for the nice mixOmic package:

First, in the perf function, I would suggest to include the nrepeat in the examples of Mfold cross validation to remind users that it is important when using the Mfold, especially if sample number is low. Althought it is stipulated elsewhere, it does not appear in examples.

In the relevance network function, network, I noticed that the text size can be choosen using the following parameter; cex.node.name = 0.8; which influence automatically the size of the text box (circle or rectangle). The box feels pretty big, an option to modify the size the text box could be really nice. Also, the possibiltiy to set the lwd.edge depending on the intensity of the similarity (high correlation would result in larger connections) could be nice to vizualize the network. I will try Cytoscape to see possibilities.

About network function, I would have to like more information about the calculation of similarity performed in the function, without having to refer to the original papers. Since we input the PLS or SPLS model in it, I thought the similarity would have been somehow determined from the model, but in truth, the similarity value is not calculated from the PLS model, if I understood well? It is just the variable are selected from the model and threshold, right?

About Explained variance of PLS and SPLS model, which I check using PLSMODEL$explained_variance. It is written in the help that the explained variance may not decrease as in PCA. In fact I noticed the total is not 100%, and that some components gain in explained variance, which is very counter intuititve. A bit more explanation would be useful. Why this behaviour? How to report/use the explained variance then?

More information about the differences of the "classic" or "regression" mode of PLS and SPLS, without having to check the references, could be useful to users and guide them in their choice. As an example, I tried to compare the "modes" of PLS from the explained variance I obtain. The explained variance for the first component were the same for the canonical and regression mode for both X and Y blocks, but much higher for Y in the classic model. Thus I expect that the classic model works better for my dataset as the goal is to predict Y from a small number of component? I thanks the author to provide many references, however it is a bit confusing where to check to see differences about the models, and also some paywall may block the access to the journal, unfortunately.

Thank you very much for your time, Best, Arno Germond

Comments (2)