Clone wiki

PanPhlAn / panphlan_map

PanPhlAn map

How to run panphlan_map on metagenomics samples?

Example of screening for E. coli strains in sample.tar.gz

a) by using the input option -i

./panphlan/panphlan_map.py -c ecoli16 -i sample.tar.gz -o map_results/sample_ecoli16.csv

b) or by using a unix pipe |

tar -xOf sample.tar.gz | ./panphlan/panphlan_map.py -c ecoli16 --fastx fastq -o map_results/sample_ecoli16.csv

Options

  • -c to specify the species database. Example: ecoli16 (Escherichia coli, version 2016) → download database
  • -i input path to a metagenomic sample
  • --tmp folder for saving temporary result file
  • -p number of processors used in bowtie2 mapping
  • --verbose to display progress information

Help -h

./panphlan/panphlan_map.py -h
  -h, --help            show this help message and exit
  -i INPUT_FILE, --input INPUT_FILE
                        File(s) containing the unpaired reads to be aligned
                        using Bowtie2. If not specified, Bowtie2 gets the read
                        from the stdin filehandle.
  --i_bowtie2_indexes INPUT_BOWTIE2_INDEXES
                        Input directory of bowtie2 indexes and pangenome
  --fastx FASTX_FORMAT  Read input format (fasta or fastq), default: fastq, if
                        not fasta recognized by file ending.
  -c CLADE_NAME, --clade CLADE_NAME
                        Name of the specie to consider, i.e. the basename of
                        the index for the reference genome used by Bowtie2 to
                        align reads.
  -o OUTPUT_FILE, --output OUTPUT_FILE
                        Mapping result output-file: path/sampleID_clade.csv
  --th_mismatches NUMOF_MISMATCHES
                        Number of mismatches to filter.
  -p NUMOF_PROCESSORS, --nproc NUMOF_PROCESSORS
                        Maximum number of processors to use. Default value is
                        the minimum between 12 and the number of available
                        processors.
  -b OUTPUT_BAM_FILE, --out_bam OUTPUT_BAM_FILE
                        Forces the name of the BAM file generated by the
                        Samtools pipeline.
  -m MEMORY_GIGABTES_FOR_SAMTOOLS, --mGB MEMORY_GIGABTES_FOR_SAMTOOLS
                        Maximum amount of memory we get available for
                        Samtools.
  --readLength READS_LENGTH
                        Minimum read length.
  --tmp TEMP_FOLDER     Alternative folder for temporary files.
  --verbose             Defines if the standard output must be verbose or not.
  -v, --version         Prints the current PanPhlAn version and exits.

panphlan_pangenome_map.py requires bowtie2 and samtools

Read more at the PanPhlAn tutorial

FAQ

Which file formats are supported by panphlan_map ?

When using the input option -i, PanPhlAn accepts the following file formats: .fastq, .fastq.gz, .fastq.bz2, .tar.gz, .tar.bz2, and .sra . If your samples are in another format, please convert or use a unix pipe instead of the input -i option.

Next step

Merge and process the mapping results to get the final gene-family presence/absence profiles of all detected strains.
→ panphlan_profile

Updated