This repository contains materials and instruction to learn skills necessary for implementing an mRNA-Seq pipeline on a linux cluster. This includes the following:

  1. Basic command line usage via terminal-quest
  2. Running FastQC on small mRNA-Seq datasets
  3. Writing bash scripts and submitting them via qsub
  4. Running trimmomatic on the included mRNA-Seq datasets and assessing sequence quality improvement
  5. Compiling a full QC report using multiqc
  6. Reimplementing all above analysis steps into a snakemake
  7. Aligning the reads with STAR against a reduced human reference genome
  8. Assessing the quality of the alignments with RseQC
  9. Performing differential expression analysis (DE) with detk

Prerequisites and Setup

This primer assumes you are comfortable connecting to a linux cluster using ssh and that you have conda installed in your cluster environment. To get started, first fork this repository in bitbucket by clicking on the + on the left side of the screen and selecting Fork this repository. This will create a repository in your own bitbucket account, which you should clone to your linux cluster environment:

git clone<your username>/bubhub_mrnaseq_primer.git

Make sure to replace <your username> appropriately for your account. Once your repo is cloned, create a conda environment named bubhub_mrnaseq_primer and activate it:

conda create -n bubhub_mrnaseq_primer python=3.5
source activate bubhub_mrnaseq_primer

Once your environment is created, make sure your current working directory is the repo root and install the conda dependencies for this primer:


After the packages are installed, you may begin playing using the materials under analysis/.