Wiki

Clone wiki

gopher-pipelines / qiime2-pipeline

qiime2-pipeline

The Qiime2 pipeline processes a 16s, 18s, or ITS experiment and generates an analysis report.

The Pipeline

The pipeline performs the following steps:

  1. Subsampling (optional): Each fastq file is reduced to a specified number of reads in order to reduce processing time
  2. FastQC (optional): FastQC is run on each fastq file to generate sequence quality plots
  3. Primer Trimming: Primer sequences are trimmed off the 5' ends of reads using cutadapt and reads with a primer are discarded. Adapter sequences are trimmed off the 3' ends of reads using cutadatp.
  4. Qiime2 Analysis: Data are imported into Qiime2; samples are denoised using dada2; a phylogenetic tree is generated and taxonomy is assigned using qiime2 functions.
  5. Beta Diversity: Beta diversity is estimated using Qiime
  6. Alpha Diversity: Alpha diversity is estimated using Qiime

Options

--variableregion region
 Use predefined primer sequences, trunclength, maxee, and reference database. Valid regions: V1V3, V3V4, V3V5, V4, V4V6, V5V6, 18S_V9, ITS1, ITS2
--emp EMP protocol, or any other protocol where primers are not present in the reads
--bowtie2index index
 Bowtie index for (host) contamination detection
--flag samplelist
 a comma-delimited list of sample names to flag in the report

Advanced options:

--r1adapter string
 sequence to trim off the 3' end of R1 reads
--r2adapter string
 sequence to trim off the 5' end of R2 reads
--crop integer crop integer bases from the start of every read (neccessary for "IIS" library prep method)
--refdb database
 Valid values include greenegenes, silva128, and its. Default is greengenes.
--referencefasta file
 A fasta file containing reference sequences (default=greengenes 97_otus.fasta)
--referencetaxonomy file
 An id_to_taxonomy_fp file (default=greengenes 97_otu_taxonomy.txt), see Qiime documentation for details
--referencealignedfasta file
 a pynast_template_alignment_fp file (default=greengenes rep_set_aligned/97_otus.fasta), see Qiime documentatino for details
--trunclength integer
 dada2 trunc-len
--maxee integer
 dada2 maxee value

Standard gopher-pipeline options

--fastqfolder folder
 A folder containing fastq files to process
--subsample integer
 Subsample the specified number of reads from each sample. 0 = no subsampling (default = 0)
--samplesheet file
 A samplesheet
--runname string
 Name of the sequencing run
--projectname string
 Name of the experiment (UMGC Project name)
--illuminasamplesheet file
 An illumina samplesheet, from which extra sample information can be obtained
--nofastqc Don't run FastQC
--samplespernode integer
 Number of samples to process simultaneously on each node (default = 1)
--threadspersample integer
 Number of threads used by each sample
--scratchfolder folder
 A temporary/scratch folder
--outputfolder folder
 A folder to deposit final results
--extraoptionsfile file
 File with extra options for trimmomatic, tophat, cuffquant, or featurecounts
--resume Continue where a failed/interrupted run left off
--verbose Print more information while running
--help Print usage instructions and exit

Fastq file support: Paired-end and single-end reads are supported. Gz-compressed or uncompressed fastq files are supported.

Samplesheet

The samplesheet supplied to the program must be a valid Qiime mapping file containing these columns:

  • SampleID: Name of the sample
  • BarcodeSequence: This column should be blank
  • LinkerPrimerSequence: The forward (R1) 16s primer (blank for primerless protocols such as emp)
  • ReversePrimer: The reverse (R2) 16s primer (blank fr primerless protocols such as emp, omit this column for single-end read datasets)
  • fastqR1: Name of the R1 fastq file (just the name, not the full path)
  • fastqR1: Name of the R2 fastq file (just the name, not the full path, blank or omitted for single-end datasets)
  • Description: This must be the final column in the mapping file

If you don't supply the pipeline with a samplesheet using the --samplesheet option it will run the createsamplesheet.pl script for you and use the sampesheet it generates. You may wish to run createsamplesheet.pl on your own first and manually edit it to suit your needs.

Running the pipeline

It is recommended you run the pipeline interactively using the --subsample option to make sure the pipeline works correctly on a small sample of your data before submitting a job to process your entire dataset. This allows you to identify and solve problems quickly. A miseq run subsampled to 1000 reads per sample should complete within five minutes for simple (e.g. gut) samples, and withing an hour for complex (e.g. soil) samples:

module load umgc
module load gopher-pipelines
qiime2-pipeline --fastqfolder /path/to/fastqs

The results of the analysis are located in /panfs/roc/scratch/USERNAME-pipelines/qiime2-RUNNAME/output. Download the entire output folder to your local computer and open up the html file to see a summary of the analysis results.

Support

Issues with the pipeline can be reported in the Bitbucket issue tracker. You may also contact John Garbe directly at jgarbe@umn.edu, response times may vary.

Updated