Wiki
Clone wikigopher-pipelines / Home
Gopher-pipelines Documentation
Contents:
- align-pipeline
- FastQC - (Trimmomatic) - BWA or Bowtie2 - Sort and index bam - (Remove duplicates)
- rnaseq-pipeline
- FastQC - (Trimmomatic) - Hisat2 or Tophat2 - Subread featureCounts - Cuffquant - Cuffnorm
- qiime2-pipeline
- Adapter trim - Read stitching - Chimera detection - Host contamination - Open reference OTUs - Alpha rarefaction - Beta diversity
- shotgun-pipeline
- OTU table - Taxonomic classification - Alpha rarefaction - Beta diversity
- createsamplesheet
Running a pipeline
Log in to MSI
- Open a terminal window (OSX) or putty (Windows, www.putty.org)
-
Open an SSH connection to MSI (replace USERNAME with your MSI username):
$ ssh USERNAME@login.msi.umn.edu
-
Log on to the Mesabi supercomputer:
$ ssh mesabi
Input experimental metadata (optional)
Providing experimental metadata (information about each sample such as treatment, group, age, gender, individualID, collection date, etc) to the pipeline will result in a more informative output..
Load the gopher-pipelines module:
$ module load umgc
$ module load gopher-pipelines
Generate a samplesheet:
$ createsamplesheet.pl -f /path/to/fastq/folder -o samplesheet.txt
Edit the tab-delimited samplesheet.txt with a text editor, add additional columns containing metadata about each sample.
When running the pipeline pass the samplesheet.txt to it using the --samplesheet option:
--samplesheet samplesheet.txt
Select a reference genome (optional)
Gopher-pipelines comes with a selection of reference genomes and annotation from Ensembl which can be loaded with the "module load ensembl" command. Each species and genome build is available as a seperate module. Run "ensembl" to get a list of available reference genomes. Species are named Genus_species, and most have a common name alias. Each species has at least one genome build available. To use a reference genome with gopher-pipelines simply load the appropriate module before running a pipeline:
$ module load gopher-pipelines
$ module load ensembl
$ module load human
$ align-pipeline --fastqfolder /path/to/fastq/folder
These three module load commands are equivalent:
$ module load human
$ module load Homo_sapiens
$ module load Homo_sapiens/GRCh38
If you would like to specify a reference of your own, gopher-pipelines pipelines supports these options (not all pipelines require all options):
--referencefasta /path/to/reference/genome/fasta
--bwaindex /path/to/bwa/index
--bowtie2index /path/to/bowtie2/index
--hisat2index /path/to/hisat2/index
--gtffile /path/to/gtf/annotation/file
Lauch an analysis job
.. note:: Interactive and submitted jobs may start running immediatly, or if Mesabi is very busy a job may wait in line for several hours until resources are availble to run the job.
Interactive ...........
Start an interactive job on Mesabi:
$ qsub -I -l walltime=8:00:00,nodes=1:ppn=24
Load necessary software modules:
$ module load umgc
$ module load human
$ module load gopher-pipelines
Run the script. You must specify the location of a folder containing fastq files using the "-fastqfolder" option. Specify how many samples to process at a time using the "--samplespernode" option (recommended value for Mesabi: 8). Each pipeline may have additional parameters to specify, refer to pipeline-specific documentation for details. An align.pipeline command example is shown here:
$ align-pipeline --samplespernode 8 --fastqfolder /path/to/fastq/folder
Submit Job ..........
A PBS script can be submitted to a queue where it will run when resources are available. Create a pbs file named gpipes.pbs containing the following text (adjust the ppn value to request all cores on a node, the pipeline doesn't work well on partial nodes):
#!/bin/bash -l
#PBS -l nodes=1:ppn=24,walltime=24:00:00
#PBS -m abe
cd $PBS_O_WORKDIR
module load umgc
module load gopher-pipelines
module load ensembl
module load human
align-pipeline --samplespernode 8 --fastqfolder /path/to/fastq/folder
Submit the pbs file to the job queue::
$ qsub gpipes.pbs
You can check the status of jobs by running qstat (replace USERNAME with your MSI username)::
$ qstat -a -u USERNAME
Job ID Username Queue Jobname SessID NDS TSK Memory Time S Time
----------------------- ----------- -------- ---------------- ------ ----- ------ ------ --------- - ---------
1023053.node1081.local jgarbe batch gpipes.pbs -- 1 24 50gb 12:00:00 Q --
The S column indicates if a job is Running or Queued.
Review results
The results of the analysis are located in /panfs/roc/scratch/USERNAME-pipelines/align-RUNNAME/
Updated