shotgun-pipeline

The shotgun pipeline processes shotgun metagenomics data and generates an analysis report.

The Pipeline

The pipeline performs the following steps:

Subsampling (optional): Each fastq file is reduced to a specified number of reads in order to reduce processing time
FastQC (optional): FastQC is run on each fastq file to generate sequence quality plots
Quality Trimming: Quality trimming is performed using Shi7.
Alignment: Reads are aligned to the World of Life (WoL) reference metagnomic database using BWA. A biom table is generated using the Wol gOTU_from_maps.py script.
Beta Diversity: Beta diversity is estimated using Qiime2
Alpha Diversity: Alpha diversity is estimated using Qiime2

Options

Advanced options

`--refdb database`
	Path to shogun database
`--flag samplelist`
	a comma-delimited list of sample names to flag in the report (not tested)

Standard gopher-pipeline options

`--fastqfolder folder`
	A folder containing fastq files to process
`--subsample integer`
	Subsample the specified number of reads from each sample. 0 = no subsampling (default = 0)
`--samplesheet file`
	A samplesheet
`--runname string`
	Name of the sequencing run
`--projectname string`
	Name of the experiment (UMGC Project name)
`--illuminasamplesheet file`
	An illumina samplesheet, from which extra sample information can be obtained
`--nofastqc`	Don't run FastQC
`--samplespernode integer`
	Number of samples to process simultaneously on each node (default = 1)
`--threadspersample integer`
	Number of threads used by each sample
`--scratchfolder folder`
	A temporary/scratch folder
`--outputfolder folder`
	A folder to deposit final results
`--extraoptionsfile file`
	File with extra options for trimmomatic, tophat, cuffquant, or featurecounts
`--resume`	Continue where a failed/interrupted run left off
`--verbose`	Print more information while running
`--help`	Print usage instructions and exit

Fastq file support: Paired-end and single-end reads are supported. Gz-compressed or uncompressed fastq files are supported.

Running the pipeline

It is recommended you run the pipeline interactively using the --subsample option to make sure the pipeline works correctly on a small sample of your data before submitting a job to process your entire dataset. This allows you to identify and solve problems quickly.

module load umgc
module load gopher-pipelines
shotgun-pipeline --fastqfolder /path/to/fastqs

The results of the analysis are located in /panfs/roc/scratch/USERNAME-pipelines/shotgun-RUNNAME/output. Download the entire output folder to your local computer and open up the html file to see a summary of the analysis results.

Support

Issues with the pipeline can be reported in the Bitbucket issue tracker. You may also contact John Garbe directly at jgarbe@umn.edu, response times may vary.

Wiki

gopher-pipelines / shotgun-pipeline