Provides standard pre-processing tools for analysis of next generation sequencing data.
These include:

  • sort_casava - Sort and rename fastq files post demultiplexing with CASAVA.
  • map_reads - Map reads.
  • index_genome - Index a genome.
  • create_exon_data - Create a set of exons for counts reads per gene.
  • add_anno_to_counts - Link gene counts to pre-specified sample annotation.
  • sum_counts_per_sample - Sum gene counts per sample across multiple sequencing runs.


install_bitbucket("ngspipeline", "jdblischak")

Required annotation file

Create a tab-separated file which contains all the relevant experimental information.
The first column must be the unique identifier supplied to CASAVA.
The file should have a header, but the column names can be anything.
An example:

id  indiv   organ   species
heart_h1    1   heart   human
heart_h2    2   heart   human
heart_h3    3   heart   human
heart_c1    1   heart   chimpanzee
heart_c2    2   heart   chimpanzee
heart_c3    3   heart   chimpanzee
liver_h1    1   liver   human
liver_h2    2   liver   human
liver_h3    3   liver   human
liver_c1    1   liver   chimpanzee
liver_c2    2   liver   chimpanzee
liver_c3    3   liver   chimpanzee

Batch sumbission

To call the function from the command line instead of an interactive R session, see the availabe exectuable scripts in ngspipeline/inst/R.


This package provides a structured interface to multiple R/Bioconductor packages.
It especially relies on Rsubread.
To see the other packages, view the package description: packageDescription("ngspipeline")

Counting reads per gene

I have not included a function for counting reads per gene because this is easily accomplished using the function featureCounts from the Rsubread package.
See ngspipeline/inst/R/count_reads.R for an example.


Copyright (C) 2014 John Blischak

Licensed under GPL-3. See file LICENSE for details.