Wiki
Clone wikimicrobiomenetworkmodelpaper / Home
README"
Code used in the paper:
Inference of Network Dynamics and Metabolic Interactions in the Gut Microbiome
Steinway SN*, Biggs MB*, Loughran TP Jr., Papin JA**, Albert R**. In preparation. 2015.
*Co-first authorship
**Co-corresponding authorship
- Code repository
- Version 1.0
Who do I talk to?
- Matt Biggs: mattbbiggs [at] gmail [dot] com
- Steven Steinway: steve.steinway [at] gmail [dot] com
Notes on Code Organization and Usage
NOTE: It may be necessary to set path variables to match your system in order for code to run properly.
The folder "Genera_metabolic_reconstructions" contains all the data necessary to reproduce the genus-level metabolic reconstructions:
- The "genus_result.txt" files contain the genus search results from the NCBI Genomes database.
- "Genera_list_for_metabolic_reconstructions.xlsx" contains the list of species that went into each genus-level reconstruction.
- "genus_seed_sets.mat", "genus_model_list.mat", "species_seed_sets.mat" and "species_list.mat" are MATLAB objects with the COBRA-format metabolic reconstructions and the calculated seed sets for each.
- The folders for each genus contain the genome sequences (fasta format), RAST annotation results for each species, Model SEED reconstructions, and reformatted reconstructions.
The folder "Metabolic_and_experimental_Analysis" contains scripts for analyzing the metabolic networks and the experimental data:
- "generaModelstoCOBRA.m" converts the Excel-format models to COBRA format objects in Matlab. This function makes use of "reformat_SEED_xls_model.m" reorganizes the Model SEED Excel files to be readable the function "d_xls2model_JAB.m" (code by Jennifer Bartell, PhD).
- "make_generic_models.m" creates the genus-level reconstructions from the species-level reconstructions.
- "competition_mutualism_scores.m" calculates the competition and mutualism metrics. This function makes use of the function "findSeedMetabolites.m".
- "network_overlap_analysis.m" calculates overlap in metabolic network content between all genus-level and species-level reconstructions.
- "ProcessData_6Feb2015.m" analyzes the experimentally-obtained growth curves. This function makes use of "processCurves.m", "multiGrowthRates_LV.m","curve_fit.m","getDerivatives.m","growthCurveMetrics.m" and "normalizeAndSmooth.m".
- "getKEGGmapsForSEEDrxns.py" extracts KEGG maps names associated with each reaction in the Model SEED reaction database (which can be obtained from the Model SEED website).
- "subsystem_enrichment_analysis.m" performs the subsystem enrichment analysis.
- "graphs_and_significance_tests.R" calculates p-values for significance tests, and creates graphs.
The folder "Metabolic_and_Experimental_Data" contains data and figures that serve as input to, or output from, the analysis scripts:
- "competition_score.txt" is read "entry ij is the fraction of seed set from row i that overlaps with seed set from column j".
- "mutualism_score.txt" is read "entry ij is the fraction of metabolites needy by row i provided by column j".
- "paths_in_boolean_net.txt" is the text-representation of the visual Boolean network in Figure 2 of the manuscript. It is read from row to column, where 0=no edge, 1/-1=direct positive/negative edges, and 2/-2=indirect positive/negative paths.
- "allGenusSeedSets.txt" are the calculated seed sets for each genus-level reconstruction.
- "enrichment_allPvals.tsv" is the full set of p-values from the enrichment analysis.
- "enrichment_allPvals_selection.txt" is a subset of interesting rows from the enrichment analysis.
- "seed_rxns_kegg_map_names.tsv" is the parsed KEGG maps and Model SEED subsystems from the Model SEED reactions database.
- "spent_media_Barnesiella_Cdiff_area_under_curves_6Feb15.tsv" contains the calculated AUC for all experimentally-obtained growth curves.
- "spent_media_Barnesiella_Cdiff_growth_curves_6Feb15.tsv" contains the normalized growth curves.
- "spent_media_Barnesiella_Cdiff_growth_rates_6Feb15.tsv" contains the calculated maximum growth rates for all experimentally-obtained growth curves.
- "spent_media_Barnesiella_Cdiff_raw_data_curves_6Feb15.tsv" contains the raw data for all experimentally-obtained growth curves.
- "SEEDrxns2KEGGmaps.mat" is a MATLAB object with a lookup table relating Model SEED reactions to KEGG maps.
The folder "Network_Inf_Files" contains scripts pertaining to the network inference:
- "OriginalData" folder contains the time series metagenomic sequencing information acquired from Buffie et al., 2012.
- "BooleanNetworkInf.R" contains the R scripts to:
- Visualize metagenomic time course of bacterial abundances
- Interpolate missing time points.
- Visualize binarized time courses
- Produce consensus binarizations
- Boolean rule inference. Boolean rule inference was completed using the implentation of the Best-fit extensition in the "Boolnet" R package.
- Acquire model steady states.
- "BinarizationScripts" folder contains the "KM_Binarization_iterated.py" script, which binarizes the continuous metagenomic genus abundance information using a previously described method called iterative k-means binarization. This code was adapted from Berestovsky & Nakleh, PLOS ONE, 2012. This script produces 1000 binarizations, which are saved in the "BinarizedData" folder. "BooleanNetworkInf.R" uses these binarizations to come up with a consensus binarization.
- "InterpData" folder contains the interpolated time series bacterial abundances produced by the "BooleanNetworkInf.R" script.
The folder "Model_Simulations" contains Python scripts used to do the Boolean network simulations.
- "RuleModeCollector_singleSSplot.py" produces the heatmaps for the normal steady states in the gut microbiome model and the effects of perturbations (Figure 3).
- This script uses the "BooleanNet" Python package.
Updated