Clone wiki

biobakery / ppanini

PPANINI Tutorial

PPANINI (Prioritization and Prediction of functional Annotation for Novel and Important genes via automated data Network Integration) is a computational pipeline that ranks genes by employing a combination of community parameters such as prevalence and abundance across samples.The resulting prioritized list of gene candidates can then be further analyzed using our visualization tools. PPANINI is available as a Bitbucket repository.

We provide support for PPANINI users via our Google group. Please feel free to send any questions to the group by posting directly or emailing ppanini-users@googlegroups.com.




1. Setup

1.2 Installation

The easiest way to install PPANINI is with pip.

To install with pip:

$ pip install ppanini

After installation from pip, you may optionally test your local PPANINI environment:

$ ppanini_test

Which yields :

  test_annotate_genes (basic_tests_annotate_genes.TestAnnotateGenesBasicFunctions) ... ok
  test_read_gene_table (basic_tests_ppanini.TestPPANINIBasicFunctions)
  Tests the function read_gene_table ... Gene Table contains 2 metadata lines .
  Gene Table contains 998 gene or centroid lines.
  ok
  test_preppanini (basic_tests_preppanini.TestPrePPANINIBasicFunctions) ... ok
  test_quantify_genes (basic_tests_quantify_genes.TestQuanitfyGenesBasicFunctions) ... ok
  test_create_folders (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_is_present (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_is_protein (basic_tests_utilities.TestUtilitiesBasicFunctions) ... /Library/Python/2.7/site-packages/biopython-1.66-py2.7-macosx-10.9-intel.egg/Bio/Seq.py:2041: BiopythonWarning: Partial codon, len(sequence) not a multiple of three. Explicitly trim the sequence or add trailing N before translation. This may become an error in future.
BiopythonWarning)
  ok
  test_pullgenes_fromcontigs (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_read_fasta (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_read_gff3 (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_read_ppanini_imp_genes_table (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_write_dict (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok
  test_write_fasta (basic_tests_utilities.TestUtilitiesBasicFunctions) ... ok

  ----------------------------------------------------------------------
  Ran 13 tests in 3.123s

  OK

1.3 Input

PPANINI prioritizes important genes including characterized genes in protein or function level and uncharacterized genes according to their properties in microbial communities. The input file is a gene families abundance table.

  • -i or --input-table

Such tables can be obtained using:


2. Quick Demo

2.1 Input file

The input file is a table of annotated gene abundances across samples. You can obtain a copy of demo input by right-clicking this link and selecting "save link as":

This file can have metadata rows as well and metadata row names should start with #.

2.2 Running PPANINI

To execute PPANINI , you can use the demo input file described above and run the following:

$ ppanini -i demo_ppanini_gene_families.txt -o ppanini_demo_output

Which yields:

--- Reading the gene table...
--- Gene Table contains 1 metadata lines.
--- Gene Table contains 2006 gene families.
--- Summarize gene families table ...
--- Number of centroids: 2006
--- Normalize gene families table ...
--- Getting prevalence abundance ...
--- Mapping UniRef90 to GO terms!
--- Loading mapping file from: /Library/Python/2.7/site-packages/ppanini-0.7.0-py2.7.egg/ppanini/data/map_uniref90_infogo1000.txt.gz
  This is a large file, one moment please...
--- Prioritize gene families ...
--- The PPANINI output is written in  ...
--- PPANINI process is successfully completed ...

2.3 Sample Output

A list of important genes families based on prevalence, abundance, and ppanini score is the output of PPANINI. At the end of the analysis, a number of files are generated as an output.

The output:

$ ls ppanini_demo_output/*

Which yields:

ppanini_table.txt

temp:
ppanini_abundance_table.txt ppanini_gene_centroids_norm.txt
$ column -t -s $'\t' ppanini_demo_output/ppanini_table.txt | less -S ```

which yields:

alpha_prevalence        prevalence_percentile  mean_abundance  abund_percentile   beta_prevalence  ppanini_score    GO
 Cluster 3236            0.6                    99.4765702891   0.0796948281516    99.850448654     24.9157897073
 Cluster 1954            0.6                    99.4765702891   0.0130959466023    97.7567298106    24.6522879233
 UniRef90_A4K468         0.8                    99.7258225324   0.0100382547968    97.0588235294    24.5935625675
 UniRef90_K4HN31         1.0                    99.9501495513   0.00852018618773   96.3609172483    24.530680432
 UniRef90_K4HN67         1.0                    99.9501495513   0.00848592460776   96.3110667996    24.524217543
 UniRef90_K4HMX4         1.0                    99.9501495513   0.00785893181152   95.8624127617    24.4659034607
 UniRef90_A4K475         0.8                    99.7258225324   0.00775611952695   95.7128614158    24.4195356794
 UniRef90_T1R4E7         0.8                    99.7258225324   0.00728619760249   95.4636091725    24.3870450964
 UniRef90_K4HNF9         0.8                    99.7258225324   0.00627188732095   94.2671984048    24.2299281008
 UniRef90_T1R5B4         0.8                    99.7258225324   0.00541914213714   93.0707876371    24.0708611077
 UniRef90_K4HMW9         0.6                    99.4765702891   0.00516704969614   92.5722831505    23.9750799516
 UniRef90_F4MIK3         0.8                    99.7258225324   0.00422832458672   90.4287138584    23.7124973226
 UniRef90_U7MMQ4         0.6                    99.4765702891   0.00343814449336   89.0329012961    23.4913598488
 Cluster 3001            0.4                    64.5812562313   0.0705061973561    99.8005982054    19.6044996171
 UniRef90_A4K498         0.4                    64.5812562313   0.018181925684     98.8035892323    19.5270861709
 UniRef90_K4HN57         0.4                    64.5812562313   0.017728406802     98.703888335     19.5192928313
 UniRef90_K4HNN3         0.4                    64.5812562313   0.0142993385429    98.1056829511    19.4723321991
 UniRef90_T1R4K8         0.4                    64.5812562313   0.0136896074371    98.0059820538    19.4644718306
 UniRef90_K4HMX7         0.4                    64.5812562313   0.013630794173     97.9561316052    19.4605380301
 UniRef90_A4K488         0.4                    64.5812562313   0.0133693314862    97.9062811565    19.456601816
 UniRef90_U7LYK0         0.4                    64.5812562313   0.0126166167897    97.7068793619    19.4408327774
 UniRef90_K4HN47         0.4                    64.5812562313   0.0111812692374    97.3080757727    19.4091781622
 UniRef90_T1R527         0.4                    64.5812562313   0.011116165726     97.258225324     19.4052103661
 UniRef90_E4GVC2         0.4                    64.5812562313   0.0107279490133    97.2083748754    19.4012401249
 UniRef90_T1R4G3         0.4                    64.5812562313   0.00964232289559   96.8594217348    19.3733797835
 UniRef90_K4HMW4         0.4                    64.5812562313   0.0091430930091    96.6600199402    19.3574054464
 UniRef90_K4HND8         0.4                    64.5812562313   0.00899340491187   96.4606181456    19.3413915505
 UniRef90_A4K470         0.4                    64.5812562313   0.0083952273161    96.2113659023    19.3213183268
 UniRef90_A4K482         0.4                    64.5812562313   0.00833622449037   96.1615154536    19.3172962118
 UniRef90_F4MIF6         0.4                    64.5812562313   0.00702192029107   95.1645064806    19.2363267508
 UniRef90_F4MIL7         0.4                    64.5812562313   0.00697228373128   95.0648055833    19.2281741816
 Cluster 116             0.4                    64.5812562313   0.0065989482817    94.666001994     19.1954618224
 UniRef90_K4HP14         0.4                    64.5812562313   0.00624827679414   94.2173479561    19.1584640209
 UniRef90_F4MIH4         0.4                    64.5812562313   0.00571372560111   93.4695912263    19.0963342428
 UniRef90_T1R508         0.4                    64.5812562313   0.00519400191157   92.7218344965    19.0336137839
 Cluster 1510            0.4                    64.5812562313   0.00447300735404   91.0767696909    18.8935076276
 UniRef90_Q45122         0.4                    64.5812562313   0.00398876503541   89.6809571286    18.772286308
 Cluster 2572            0.4                    64.5812562313   0.00193082354688   87.8863409771    18.6131689939
 UniRef90_F1WXK5         0.4                    64.5812562313   0.00152630714186   87.7367896311    18.5997399717
 UniRef90_Q4FPR4         0.4                    64.5812562313   0.00120935236567   87.5872382851    18.5862845534    GO:0003735
 UniRef90_D5V846         0.4                    64.5812562313   0.00119936831464   87.5373878365    18.5817935347    GO:0003735
 UniRef90_D5V8C4         0.4                    64.5812562313   0.00110299850764   87.4376869392    18.572802661     GO:0003735
 UniRef90_D5VA18         0.4                    64.5812562313   0.00107770647666   87.3878364905    18.5683028003
 ...

2.2 Visualize the summary of characterization of genes

To plot a summary of the gene characterization in the sample community.

$ cd ppanini_demo_output/
$ ppanini_barplot -i1 temp/ppanini_abundance_table.txt -i2 ppanini_table.txt

Which yields:

ppanini_barplot.png

Updated