Clone wiki

platform / Gram-wise_OLS

Gram-Wise OLS

Code to reduce the feature space by only keeping the features which have the highest association with a dependent variable in meta data.

The code is split into two functions, f_wise_ols, which runs a simple linear regression for each feature (gram) independently, and f_reduce which uses the t statistics from those linear regressions to rank the features (grams) and keep the highest ranking features. The function main combines the two, its arguments are:

  • feat_path: path to directory which holds feature files
  • meta_path: path to meta data
  • fid_col: name of column which has file ids in the meta data file
  • y: name of column in meta that is the target variable
  • inter: list of names of interaction variables. Defaults to None
  • thresh: Don't use features with less than thresh non zero elements in the feature vector
  • num: dimension of feature space after reducing (number of features to keep)

From the command line, python -h will list the command line arguments which correspond to the arguments above