Wiki
Clone wikigopher-pipelines / qiime2-pipeline
qiime2-pipeline
The Qiime2 pipeline processes a 16s, 18s, or ITS experiment and generates an analysis report.
The Pipeline
The pipeline performs the following steps:
- Subsampling (optional): Each fastq file is reduced to a specified number of reads in order to reduce processing time
- FastQC (optional): FastQC is run on each fastq file to generate sequence quality plots
- Primer Trimming: Primer sequences are trimmed off the 5' ends of reads using cutadapt and reads with a primer are discarded. Adapter sequences are trimmed off the 3' ends of reads using cutadatp.
- Qiime2 Analysis: Data are imported into Qiime2; samples are denoised using dada2; a phylogenetic tree is generated and taxonomy is assigned using qiime2 functions.
- Beta Diversity: Beta diversity is estimated using Qiime
- Alpha Diversity: Alpha diversity is estimated using Qiime
Options
--variableregion region Use predefined primer sequences, trunclength, maxee, and reference database. Valid regions: V1V3, V3V4, V3V5, V4, V4V6, V5V6, 18S_V9, ITS1, ITS2 --emp EMP protocol, or any other protocol where primers are not present in the reads --bowtie2index index Bowtie index for (host) contamination detection --flag samplelist a comma-delimited list of sample names to flag in the report
Advanced options:
--r1adapter string sequence to trim off the 3' end of R1 reads --r2adapter string sequence to trim off the 5' end of R2 reads --crop integer crop integer bases from the start of every read (neccessary for "IIS" library prep method) --refdb database Valid values include greenegenes, silva128, and its. Default is greengenes. --referencefasta file A fasta file containing reference sequences (default=greengenes 97_otus.fasta) --referencetaxonomy file An id_to_taxonomy_fp file (default=greengenes 97_otu_taxonomy.txt), see Qiime documentation for details --referencealignedfasta file a pynast_template_alignment_fp file (default=greengenes rep_set_aligned/97_otus.fasta), see Qiime documentatino for details --trunclength integer dada2 trunc-len --maxee integer dada2 maxee value
Standard gopher-pipeline options
--fastqfolder folder | |
A folder containing fastq files to process | |
--subsample integer | |
Subsample the specified number of reads from each sample. 0 = no subsampling (default = 0) | |
--samplesheet file | |
A samplesheet | |
--runname string | |
Name of the sequencing run | |
--projectname string | |
Name of the experiment (UMGC Project name) | |
--illuminasamplesheet file | |
An illumina samplesheet, from which extra sample information can be obtained | |
--nofastqc | Don't run FastQC |
--samplespernode integer | |
Number of samples to process simultaneously on each node (default = 1) | |
--threadspersample integer | |
Number of threads used by each sample | |
--scratchfolder folder | |
A temporary/scratch folder | |
--outputfolder folder | |
A folder to deposit final results | |
--extraoptionsfile file | |
File with extra options for trimmomatic, tophat, cuffquant, or featurecounts | |
--resume | Continue where a failed/interrupted run left off |
--verbose | Print more information while running |
--help | Print usage instructions and exit |
Fastq file support: Paired-end and single-end reads are supported. Gz-compressed or uncompressed fastq files are supported.
Samplesheet
The samplesheet supplied to the program must be a valid Qiime mapping file containing these columns:
- SampleID: Name of the sample
- BarcodeSequence: This column should be blank
- LinkerPrimerSequence: The forward (R1) 16s primer (blank for primerless protocols such as emp)
- ReversePrimer: The reverse (R2) 16s primer (blank fr primerless protocols such as emp, omit this column for single-end read datasets)
- fastqR1: Name of the R1 fastq file (just the name, not the full path)
- fastqR1: Name of the R2 fastq file (just the name, not the full path, blank or omitted for single-end datasets)
- Description: This must be the final column in the mapping file
If you don't supply the pipeline with a samplesheet using the --samplesheet option it will run the createsamplesheet.pl script for you and use the sampesheet it generates. You may wish to run createsamplesheet.pl on your own first and manually edit it to suit your needs.
Running the pipeline
It is recommended you run the pipeline interactively using the --subsample option to make sure the pipeline works correctly on a small sample of your data before submitting a job to process your entire dataset. This allows you to identify and solve problems quickly. A miseq run subsampled to 1000 reads per sample should complete within five minutes for simple (e.g. gut) samples, and withing an hour for complex (e.g. soil) samples:
module load umgc module load gopher-pipelines qiime2-pipeline --fastqfolder /path/to/fastqs
The results of the analysis are located in /panfs/roc/scratch/USERNAME-pipelines/qiime2-RUNNAME/output. Download the entire output folder to your local computer and open up the html file to see a summary of the analysis results.
Support
Issues with the pipeline can be reported in the Bitbucket issue tracker. You may also contact John Garbe directly at jgarbe@umn.edu, response times may vary.
Updated