Provides standard pre-processing tools for analysis of next generation sequencing data.
sort_casava- Sort and rename fastq files post demultiplexing with CASAVA.
map_reads- Map reads.
index_genome- Index a genome.
create_exon_data- Create a set of exons for counts reads per gene.
add_anno_to_counts- Link gene counts to pre-specified sample annotation.
sum_counts_per_sample- Sum gene counts per sample across multiple sequencing runs.
library(devtools) install_bitbucket("ngspipeline", "jdblischak")
Required annotation file
Create a tab-separated file which contains all the relevant experimental information.
The first column must be the unique identifier supplied to CASAVA.
The file should have a header, but the column names can be anything.
id indiv organ species heart_h1 1 heart human heart_h2 2 heart human heart_h3 3 heart human heart_c1 1 heart chimpanzee heart_c2 2 heart chimpanzee heart_c3 3 heart chimpanzee liver_h1 1 liver human liver_h2 2 liver human liver_h3 3 liver human liver_c1 1 liver chimpanzee liver_c2 2 liver chimpanzee liver_c3 3 liver chimpanzee
To call the function from the command line instead of an interactive R session, see the availabe exectuable scripts in
This package provides a structured interface to multiple R/Bioconductor packages.
It especially relies on Rsubread.
To see the other packages, view the package description:
Counting reads per gene
I have not included a function for counting reads per gene because this is easily accomplished using the function
featureCounts from the Rsubread package.
ngspipeline/inst/R/count_reads.R for an example.
Copyright (C) 2014 John Blischak
Licensed under GPL-3. See file LICENSE for details.