This source code repository houses data analysis source code
and processed data files from a several RNA-Seq experiments
investigating gene expression profiles in southern highbush
An article describing this work now appears in the journal GigaScience:
- RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing GigaScience 2015, 4:5 Published: 13 February 2015
Visualization of read alignments
Alignments of reads and junction features deduced from splice read alignments
can be viewed using Integrated Genome Browser, which is freely available from
To view the data in IGB:
- Get a copy of IGB (http://www.bioviz.org)
- Click the blueberry icon on the Start screen (next to Arabidopsis)
- RNA-Seq and related data sets are available in the Data Access tab
Blueberry gene models
Blueberry gene models come from genome-guided assembly of blueberry Illumina
RNA-Seq data and ab initio gene finder analysis of blueberry genome sequence.
About folders in this repository
Folders contained in this repository represent data analysis "modules"
that are mostly independent but often use files and results from other
This module contains code for analysis of gene structures and annotated
alternative splicing events. The output of gene-finding (see above) is
BED file V_corymbosum_scaffold_May_2013.bed in GeneModelAnalysis/data.
This module processes output from running samtools view -c on the blueberry
gene models bed file. It produces a counts file containing numebers of
single-mapping reads for each annotated blueberry gene. The counts data
are in CountsData/results/berry_dev.tsv.gz. This file is designed to be
read into R/Bioconductor for differential expression analysis.
This module uses counts data and the edgeR Bioconductor library to identify
differentially expressed genes. DE genes from each comparison are saved in
This module also creates gene expression data as scaled and unscaled RPKM
values. These are used for clustering and comparing expression of metabolic
pathway genes and other gene families of interest.
Contains data files downloaded from other Web sites or generated in upstream bioinformatics data processing steps.
This module describes using GOSeq to identify Gene Ontology categories with unusually many differetially expressed genes.
It depends on the DifferentialExpression module.
This module contains code used to assess the May 2013 genome assembly.
This module contains files created by Vikas Gupta listing SRA accessions
for blueberry 454 sequence data.
This module contains code for processing results from blastx searches of
blueberry virtual cDNAs against protein databases from GenBank, including
the nr protein database and several plant RefSeq databases. This module
also adds new annotations to the blueberry gene models by creating new text
for field 14 in gene model BED file. The output of this is saved in
BlastxAnalysis/results as V_corymbosum_scaffold_May_2013_withDescr.bed.gz.
Copyright (c) 2015 University of North Carolina at Charlotte
Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
Also, see: http://opensource.org/licenses/MIT