This source code repository houses data analysis source code and processed data files from a several RNA-Seq experiments investigating gene expression profiles in southern highbush blueberry.
An article describing this work now appears in the journal GigaScience:
- RNA-Seq analysis and annotation of a draft blueberry genome assembly identifies candidate genes involved in fruit ripening, biosynthesis of bioactive compounds, and stage-specific alternative splicing GigaScience 2015, 4:5 Published: 13 February 2015
Visualization of read alignments
Alignments of reads and junction features deduced from splice read alignments can be viewed using Integrated Genome Browser, which is freely available from BioViz.org.
To view the data in IGB:
- Get a copy of IGB (http://www.bioviz.org)
- Click the blueberry icon on the Start screen (next to Arabidopsis)
- RNA-Seq and related data sets are available in the Data Access tab
Blueberry gene models
Blueberry gene models come from genome-guided assembly of blueberry Illumina RNA-Seq data and ab initio gene finder analysis of blueberry genome sequence.
About folders in this repository
Folders contained in this repository represent data analysis "modules" that are mostly independent but often use files and results from other modules.
This module contains code for analysis of gene structures and annotated alternative splicing events. The output of gene-finding (see above) is BED file V_corymbosum_scaffold_May_2013.bed in GeneModelAnalysis/data.
This module processes output from running samtools view -c on the blueberry gene models bed file. It produces a counts file containing numebers of single-mapping reads for each annotated blueberry gene. The counts data are in CountsData/results/berry_dev.tsv.gz. This file is designed to be read into R/Bioconductor for differential expression analysis.
This module uses counts data and the edgeR Bioconductor library to identify differentially expressed genes. DE genes from each comparison are saved in DiffExpExpression/results.
This module also creates gene expression data as scaled and unscaled RPKM values. These are used for clustering and comparing expression of metabolic pathway genes and other gene families of interest.
Contains data files downloaded from other Web sites or generated in upstream bioinformatics data processing steps.
This module describes using GOSeq to identify Gene Ontology categories with unusually many differetially expressed genes. It depends on the DifferentialExpression module.
This module contains code used to assess the May 2013 genome assembly.
This module contains files created by Vikas Gupta listing SRA accessions for blueberry 454 sequence data.
This module contains code for processing results from blastx searches of blueberry virtual cDNAs against protein databases from GenBank, including the nr protein database and several plant RefSeq databases. This module also adds new annotations to the blueberry gene models by creating new text for field 14 in gene model BED file. The output of this is saved in BlastxAnalysis/results as V_corymbosum_scaffold_May_2013_withDescr.bed.gz.
Copyright (c) 2015 University of North Carolina at Charlotte
Permission is hereby granted, free of charge, to any person obtaining a copy of this software and associated documentation files (the "Software"), to deal in the Software without restriction, including without limitation the rights to use, copy, modify, merge, publish, distribute, sublicense, and/or sell copies of the Software, and to permit persons to whom the Software is furnished to do so, subject to the following conditions:
The above copyright notice and this permission notice shall be included in all copies or substantial portions of the Software.
THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE SOFTWARE.
Also, see: http://opensource.org/licenses/MIT