This repository provides the source code in support of the manuscript "Development of a machine learning approach, ADAGE, for analysis of publically available gene expression data collections: Application to the interpretation Pseudomonas aeruginosa-host interactions" (submitted).

To set up ADAGE, first clone the repository. This is a short summary. Detailed instructions and steps to generate the model and reproduce analyses used in the manuscript are in pseudomonas_autoencoder.sh

Building an ADAGE model requires installing python packages Theano and Docopt

Instructions for Theano: http://deeplearning.net/software/theano/install.html

Instructions for docopt: https://pypi.python.org/pypi/docopt

We provide a gene expression compendium of Pseudomonas aeruginosa that contains datasets available before 02.22.2014. To get an up-to-date compendium, follow the instructions in Section One in pseudomonas_autoencoder.sh

Before training, first 0-1 normalize the compendium, run

python Data_collection_processing/zero_one_normalization.py Data_collection_processing/Pa_compendium_02.22.2014.pcl Train_test_DAs/train_set_normalized.pcl None

To train a denoising autoencoders, run

python Train_test_DAs/SdA_train.py Train_test_DAs/train_set_normalized.pcl --parameters

To test a dataset on an ADAGE model, run

python Train_test_DAs/SdA_test.py Train_test_DAs/Genome-hybs_normalized.pcl --parameters

Please email jie.tan.gr@dartmouth.edu if you have questions.