Clone wiki

platform / Partial_Least_Squares

Partial Least Squares implementation in scikit-learn:

http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.PLSRegression.html

Cross Correlation Analysis maximizes the correlation rather than the covariance:

http://scikit-learn.org/stable/modules/generated/sklearn.cross_decomposition.CCA.html#sklearn.cross_decomposition.CCA

Code to use Partial Least Squares. The function main uses k-fold cross validation to tune the number of components, trains the optimal model on training data and saves predictions on test data. The arguments are

  • feat_path: path to directory which holds feature files
  • red_feat_path: path to pickled reduced feature matrix
  • meta_path: path to meta data
  • fid_col: name of column which has file ids in the meta data file
  • y: name of feature to predict
  • test_prop: proportion for testing data. Defaults to .25
  • max_comp: maximum number of components to consider
  • n_folds: number of folds to use in K-fold cross validation

From the command line python pls.py -h will list the command line arguments which correspond to the arguments above.

The argument ``test_propaffects the amount of testing data used. Iftest_prop``` is set to 0 then the model will be trained on all of the data and will predict the output on all of the data

Updated