<!-- README.md is generated from README.Rmd. Please edit that file -->
ctlcon <img src="man/figures/ctlverse-sticker-01.png" align="right" width="200"/>
The goal of ctlcon is to organize high-throughput sequencing data derived from CD8+ T cells (aka Cytotoxic T Lymphocytes) into a consensus R package for data portability, accessibility, and reproducibility. This is the first package of the ctlverse, which is envisioned to become a resource tailored to the analysis of CD8+ T cells. The ctlverse strives to use modern R idioms, and is particularly inspired by the tidyverse in its development style.
You can install the development version of ctlcon from bitbucket with:
Additionally, I recommend installing the tidyverse family of package
install.packages('tidyverse')) prior to loading the package.
Currently the package focuses on CTL data derived from mice in the context of acute viral infection (generally LCMV), and currently works on data from three laboratories - Pereira, Kaech, and Goldrath. Data is generally derived from naive (N), bulk effector (E), bulk memory (M), and effector cell subsets memory precursors (MP) and terminal effectors (TE) (as defined by IL7R and KLRG1 expression).
Future development may involve adding additional datasets or expanding into human CTL immunology. To facilitate adding datasets, I believe it is imperative to process all data through the same pipeline (see seqsnake), and I would be happy to collaborate in expanding the datasets included in this package.
Preprocessing procedures are outside the scope of this package, but generally include quality control, alignment, and peak calling procedures. For more information, see seqsnake and/or contact me. While raw data is not accessible due to file size limitations, please contact me to coordinate data sharing, however, note that all data is accessible via the Gene Expression Omnibus and/or Sequence Read Archive.
All final user-facing outputs are stored in the
data/ folder, and once
the package is loaded, the objects therein are accessible via their
name. Available datasets are most easily viewed via the online reference
at ctlcon.netlify.com or the package manual.
To learn more about how individual objects were crafted, see the
data-raw folder for relevant R scripts.
Data can be accessed via the various data objects stored in the package.
Each object has its own documentation which can be accessed via the help
?. Generally, the data can be organized into the following
A consistent genome annotation is necessary for the integration of data
from various -omics types. Ensembl based annotations are the primary
reference used throughout the package, with associated Entrez IDs and
gene symbols (for human readability) as secondary annotations. See
transcripts_mm10 for gene and transcripts, respectively.
transcripts_mm10 is the primary resource from which all other genome
annotations are derived (including those below), and is defined based on
the latest Ensembl build GRCm38.p5 (release 91).
tss_mm10 is for the annotation of genomic regions data (for
example from ChIP-seq) to nearest genes. Some care should be taken in
using this as TSS’s are defined by transcript start sites. It may be
desirable to collapse the object into gene-level annotations.
For some annotations, it may be desirable instead to annotate to an
entire genomic range as opposed to simply the start site. For such uses,
provided as well.
homologenes can be used to map between mouse and human genes using
Genesets from MSigDB are tidied up for use in downstream analyses,
msigdb_hg38 and mapped to mouse via
mapping object contains annotation data for each sample used
throughout the package for data described below.
txi_db object contains raw counts expression data derived by
Kallisto, and normalized
downstream transforms of the data are included for ease of use,
including DESeq2 objects
rlog, as well as tibbles derived
from these objects as
Results derived from ATAC-seq or ChIP-seq via MACS2 are organized into
region files, generally in bed-compatible format, which at minimum
includes a chromosome, start, and end region, and additional columns
describing characteristics of the region (generally annotation
information and peak call statistics). Raw outputs can be viewed by from
For replicated samples, it is desirable to calculate consensus regions
common to shared conditions or cell types. These can be accessed by
Differential Region Analyses
Raw counts were derived for the various regions and also imputed into
DESeq2 for differential analysis of regions across conditions,
All source code is viewable on Bitbucket at https://bitbucket.com/robert_amezquita/ctlcon. Please feel free to submit issues to provide feedback, requests, or start a conversation about development, and following up by contributing via pull requests to the repo.
For questions and feedback, please email me at firstname.lastname@example.org or submit issues to the Bitbucket repo.