qiime2-pipeline

The Qiime2 pipeline processes a 16s, 18s, or ITS experiment and generates an analysis report.

The Pipeline

The pipeline performs the following steps:

Subsampling (optional): Each fastq file is reduced to a specified number of reads in order to reduce processing time
FastQC (optional): FastQC is run on each fastq file to generate sequence quality plots
Primer Trimming: Primer sequences are trimmed off the 5' ends of reads using cutadapt and reads with a primer are discarded. Adapter sequences are trimmed off the 3' ends of reads using cutadatp.
Qiime2 Analysis: Data are imported into Qiime2; samples are denoised using dada2; a phylogenetic tree is generated and taxonomy is assigned using qiime2 functions.
Beta Diversity: Beta diversity is estimated using Qiime
Alpha Diversity: Alpha diversity is estimated using Qiime

Options

--variableregion region

Use predefined primer sequences, trunclength, maxee, and reference database. Valid regions: V1V3, V3V4, V3V5, V4, V4V6, V5V6, 18S_V9, ITS1, ITS2

--emp EMP protocol, or any other protocol where primers are not present in the reads

--bowtie2index index

Bowtie index for (host) contamination detection

--flag samplelist

a comma-delimited list of sample names to flag in the report

Advanced options:

--r1adapter string

sequence to trim off the 3' end of R1 reads

--r2adapter string

sequence to trim off the 5' end of R2 reads

--crop integer crop integer bases from the start of every read (neccessary for "IIS" library prep method)

--refdb database

Valid values include greenegenes, silva128, and its. Default is greengenes.

--referencefasta file

A fasta file containing reference sequences (default=greengenes 97_otus.fasta)

--referencetaxonomy file

An id_to_taxonomy_fp file (default=greengenes 97_otu_taxonomy.txt), see Qiime documentation for details

--referencealignedfasta file

a pynast_template_alignment_fp file (default=greengenes rep_set_aligned/97_otus.fasta), see Qiime documentatino for details

--trunclength integer

dada2 trunc-len

--maxee integer

dada2 maxee value

Standard gopher-pipeline options

`--fastqfolder folder`
	A folder containing fastq files to process
`--subsample integer`
	Subsample the specified number of reads from each sample. 0 = no subsampling (default = 0)
`--samplesheet file`
	A samplesheet
`--runname string`
	Name of the sequencing run
`--projectname string`
	Name of the experiment (UMGC Project name)
`--illuminasamplesheet file`
	An illumina samplesheet, from which extra sample information can be obtained
`--nofastqc`	Don't run FastQC
`--samplespernode integer`
	Number of samples to process simultaneously on each node (default = 1)
`--threadspersample integer`
	Number of threads used by each sample
`--scratchfolder folder`
	A temporary/scratch folder
`--outputfolder folder`
	A folder to deposit final results
`--extraoptionsfile file`
	File with extra options for trimmomatic, tophat, cuffquant, or featurecounts
`--resume`	Continue where a failed/interrupted run left off
`--verbose`	Print more information while running
`--help`	Print usage instructions and exit

Fastq file support: Paired-end and single-end reads are supported. Gz-compressed or uncompressed fastq files are supported.

Samplesheet

The samplesheet supplied to the program must be a valid Qiime mapping file containing these columns:

SampleID: Name of the sample
BarcodeSequence: This column should be blank
LinkerPrimerSequence: The forward (R1) 16s primer (blank for primerless protocols such as emp)
ReversePrimer: The reverse (R2) 16s primer (blank fr primerless protocols such as emp, omit this column for single-end read datasets)
fastqR1: Name of the R1 fastq file (just the name, not the full path)
fastqR1: Name of the R2 fastq file (just the name, not the full path, blank or omitted for single-end datasets)
Description: This must be the final column in the mapping file

If you don't supply the pipeline with a samplesheet using the --samplesheet option it will run the createsamplesheet.pl script for you and use the sampesheet it generates. You may wish to run createsamplesheet.pl on your own first and manually edit it to suit your needs.

Running the pipeline

It is recommended you run the pipeline interactively using the --subsample option to make sure the pipeline works correctly on a small sample of your data before submitting a job to process your entire dataset. This allows you to identify and solve problems quickly. A miseq run subsampled to 1000 reads per sample should complete within five minutes for simple (e.g. gut) samples, and withing an hour for complex (e.g. soil) samples:

module load umgc
module load gopher-pipelines
qiime2-pipeline --fastqfolder /path/to/fastqs

The results of the analysis are located in /panfs/roc/scratch/USERNAME-pipelines/qiime2-RUNNAME/output. Download the entire output folder to your local computer and open up the html file to see a summary of the analysis results.

Support

Issues with the pipeline can be reported in the Bitbucket issue tracker. You may also contact John Garbe directly at jgarbe@umn.edu, response times may vary.

Wiki

gopher-pipelines / qiime2-pipeline