HTTPS SSH

<!-- README.md is generated from README.Rmd. Please edit that file -->

ctlref <img src="man/figures/ctlverse-sticker-01.png" align="right" width="200"/>

The goal of ctlref is to facilitate creating a simpler, tidier version of essential genome annotation references straight from reliable sources, primarily Ensembl and Bioconductor. This package distills annotations to the bare essentials for maximal portability, accessibility, and translation of results.

Detailed instructions are included in package, accessible via ?ctlref, but briefly, this package parses GTF, sequence report (chromosome sizes), and Bioconductor transcript databases to create tidy tables of coordinates and mapped annotations from source references (primarily Ensembl). The primary outputs that are created by this package, in the recommended order, are:

  • chromosome sizes - from a sequence assembly report, creates a chromosome size table
  • transcripts - creates a table of transcript coordinates (chrom, start, end) along with mapped identifiers (Ensembltranscript and gene-level ids, ENTREZ ids, and symbol)
  • genes - from the transcripts table, creates a gene-level combined reference
  • tss - from the transcripts table, creates a transcript start site (TSS) coordinate centered table

Included Reference Annotations

Within the package are the two most common reference annotations used in the analysis of biological data from CD8+ T cells. These annotations are accessible as data objects once the package is loaded as hs.annotation, hs.chromsizes, mm.annotation, mm.chromsizes, for reference annotations and chromosome sizes for human and mouse, respectively. The included R functions were used to generate these references from Ensembl source references.

Creating New Reference Annotations

The package includes functions to create new references, namely parse_chromsizes and parse_gtf. See the function documentation for more information.

Installation

You can install the development version of ctlcon from bitbucket with:

devtools::install_bitbucket("robert_amezquita/ctlref")

Downloaded files from the Ensembl website will be required. Additionally, organism transcript annotations will be required from Bioconductor. See the ?ctlref package documentation for more details and instructions.

Source Code

All source code is viewable on Bitbucket at https://bitbucket.com/robert_amezquita/ctlcon. Please feel free to submit issues to provide feedback, requests, or start a conversation about development, and following up by contributing via pull requests to the repo.

Contact

For questions and feedback, please email me at robert.amezquita@fredhutch.org or submit issues to the Bitbucket repo.