Wiki

Clone wiki

microbiomenetworkmodelpaper / Home

README"

Code used in the paper:

Inference of Network Dynamics and Metabolic Interactions in the Gut Microbiome

Steinway SN*, Biggs MB*, Loughran TP Jr., Papin JA**, Albert R**. In preparation. 2015.

*Co-first authorship

**Co-corresponding authorship

  • Code repository
  • Version 1.0

Who do I talk to?

  • Matt Biggs: mattbbiggs [at] gmail [dot] com
  • Steven Steinway: steve.steinway [at] gmail [dot] com

Notes on Code Organization and Usage

NOTE: It may be necessary to set path variables to match your system in order for code to run properly.

The folder "Genera_metabolic_reconstructions" contains all the data necessary to reproduce the genus-level metabolic reconstructions:

  • The "genus_result.txt" files contain the genus search results from the NCBI Genomes database.
  • "Genera_list_for_metabolic_reconstructions.xlsx" contains the list of species that went into each genus-level reconstruction.
  • "genus_seed_sets.mat", "genus_model_list.mat", "species_seed_sets.mat" and "species_list.mat" are MATLAB objects with the COBRA-format metabolic reconstructions and the calculated seed sets for each.
  • The folders for each genus contain the genome sequences (fasta format), RAST annotation results for each species, Model SEED reconstructions, and reformatted reconstructions.

The folder "Metabolic_and_experimental_Analysis" contains scripts for analyzing the metabolic networks and the experimental data:

  • "generaModelstoCOBRA.m" converts the Excel-format models to COBRA format objects in Matlab. This function makes use of "reformat_SEED_xls_model.m" reorganizes the Model SEED Excel files to be readable the function "d_xls2model_JAB.m" (code by Jennifer Bartell, PhD).
  • "make_generic_models.m" creates the genus-level reconstructions from the species-level reconstructions.
  • "competition_mutualism_scores.m" calculates the competition and mutualism metrics. This function makes use of the function "findSeedMetabolites.m".
  • "network_overlap_analysis.m" calculates overlap in metabolic network content between all genus-level and species-level reconstructions.
  • "ProcessData_6Feb2015.m" analyzes the experimentally-obtained growth curves. This function makes use of "processCurves.m", "multiGrowthRates_LV.m","curve_fit.m","getDerivatives.m","growthCurveMetrics.m" and "normalizeAndSmooth.m".
  • "getKEGGmapsForSEEDrxns.py" extracts KEGG maps names associated with each reaction in the Model SEED reaction database (which can be obtained from the Model SEED website).
  • "subsystem_enrichment_analysis.m" performs the subsystem enrichment analysis.
  • "graphs_and_significance_tests.R" calculates p-values for significance tests, and creates graphs.

The folder "Metabolic_and_Experimental_Data" contains data and figures that serve as input to, or output from, the analysis scripts:

  • "competition_score.txt" is read "entry ij is the fraction of seed set from row i that overlaps with seed set from column j".
  • "mutualism_score.txt" is read "entry ij is the fraction of metabolites needy by row i provided by column j".
  • "paths_in_boolean_net.txt" is the text-representation of the visual Boolean network in Figure 2 of the manuscript. It is read from row to column, where 0=no edge, 1/-1=direct positive/negative edges, and 2/-2=indirect positive/negative paths.
  • "allGenusSeedSets.txt" are the calculated seed sets for each genus-level reconstruction.
  • "enrichment_allPvals.tsv" is the full set of p-values from the enrichment analysis.
  • "enrichment_allPvals_selection.txt" is a subset of interesting rows from the enrichment analysis.
  • "seed_rxns_kegg_map_names.tsv" is the parsed KEGG maps and Model SEED subsystems from the Model SEED reactions database.
  • "spent_media_Barnesiella_Cdiff_area_under_curves_6Feb15.tsv" contains the calculated AUC for all experimentally-obtained growth curves.
  • "spent_media_Barnesiella_Cdiff_growth_curves_6Feb15.tsv" contains the normalized growth curves.
  • "spent_media_Barnesiella_Cdiff_growth_rates_6Feb15.tsv" contains the calculated maximum growth rates for all experimentally-obtained growth curves.
  • "spent_media_Barnesiella_Cdiff_raw_data_curves_6Feb15.tsv" contains the raw data for all experimentally-obtained growth curves.
  • "SEEDrxns2KEGGmaps.mat" is a MATLAB object with a lookup table relating Model SEED reactions to KEGG maps.

The folder "Network_Inf_Files" contains scripts pertaining to the network inference:

  • "OriginalData" folder contains the time series metagenomic sequencing information acquired from Buffie et al., 2012.
  • "BooleanNetworkInf.R" contains the R scripts to:
    1. Visualize metagenomic time course of bacterial abundances
    2. Interpolate missing time points.
    3. Visualize binarized time courses
    4. Produce consensus binarizations
    5. Boolean rule inference. Boolean rule inference was completed using the implentation of the Best-fit extensition in the "Boolnet" R package.
    6. Acquire model steady states.
  • "BinarizationScripts" folder contains the "KM_Binarization_iterated.py" script, which binarizes the continuous metagenomic genus abundance information using a previously described method called iterative k-means binarization. This code was adapted from Berestovsky & Nakleh, PLOS ONE, 2012. This script produces 1000 binarizations, which are saved in the "BinarizedData" folder. "BooleanNetworkInf.R" uses these binarizations to come up with a consensus binarization.
  • "InterpData" folder contains the interpolated time series bacterial abundances produced by the "BooleanNetworkInf.R" script.

The folder "Model_Simulations" contains Python scripts used to do the Boolean network simulations.

  • "RuleModeCollector_singleSSplot.py" produces the heatmaps for the normal steady states in the gut microbiome model and the effects of perturbations (Figure 3).
  • This script uses the "BooleanNet" Python package.

Updated