1. Francesco Marass
  2. cloe

Overview

HTTPS SSH

Welcome to Cloe

Cloe (pronounced like the name Chloë) is a computational biology tool to infer the clonal structure of heterogeneous tumour samples. It implements a phylogenetic latent feature model which discovers hierarchically related patterns (clonal genotypes) in the samples, and with these describes the observed mutation data.

Requirements

Cloe has been developed with R >= 3.2.1. It has been tested on Linux (Debian stable) and Mac OS X (10.8.5 and later).

To install its R dependencies, run the following line:

install.packages(c("devtools", "R6", "compiler", "digest", "ggplot2", "igraph", "RColorBrewer", "reshape2"))

If you don't mind, also get gridExtra.

install.packages("gridExtra")

Install

Installing Cloe can be done directly from this repository:

library(devtools)
install_bitbucket("fm361/cloe")

If you have pandoc installed, you can also build the vignette:

install_bitbucket("fm361/cloe", build_vignettes=TRUE)

Ready to go

If the above commands have run successfully, you will be ready to run Cloe. Please refer to the vignette for a tutorial on how to run Cloe. For a quick overview of Cloe's workflow, read on.

Running Cloe consists of four steps:

  1. Create an input object
  2. Run the sampler
  3. Get the best sets of parameters
  4. Select the model

Here is a brief example:

library(cloe)

# 0. Load in the data
reads  <- as.matrix(read.table("reads.txt", header=TRUE, row.names=1))
depths <- as.matrix(read.table("depths.txt", header=TRUE, row.names=1))

# 1. Create an input object
ci <- cloe_input$new(reads, depths)

# 2. Run the sampler
cm4 <- sampler(input=ci, method="cnn", K=4L, iterations=20000L)

# 3. Get the best sets of parameters
cs4 <- summarise(mcmc_object=cm4, burn=0.5, thin=4L, solutions=4L)

# 4. Select the model
# 
# css <- list(cs3, cs4, cs5)
# top_cs <- select_model(l=css, solutions=6L, plot=TRUE)

In step 2, the sampler runs our MCMC(MC) algorithm using the number of clones K that you specify. If you do not know how many clones are present in the data, you should run the sampler for several likely values, and select "the best model" in step 4. By default Cloe runs 4 parallel tempered chains. You can change this behaviour by specifying how many chains you wish and their temperatures (e.g. chains=2, temperatures=c(1, 0.9)).

The summarise function of step 3 discards iterations at the beginning of the chain with the burn option (it takes a proportion of the iterations, e.g. burn=0.5 discards the first half of the chain), it can thin the chain taking every i^th iteration with thin=i, and it returns a number of solutions sorted by decreasing log-posterior probability.

Note: you can plot all of Cloe's classes, but note that plots are automatically written to disk. This behaviour may change in the future.

Model selection

select_model returns a list of cloe_summary objects sorted by decreasing log-posterior probability. The model selection's plots show how well each model fits the data (log-likelihood) and how model complexity affects the fit (log-posterior). For approximately equal log-posteriors choose the model with the better fit to the data.

Validation dataset

From version 0.9.8, Cloe's validation dataset (mixtures of single-cell diluted cell lines) is available within Cloe's package.

library(cloe)

# data
reads  <- cloe_val_reads
depths <- cloe_val_depths

# correct clonal structure
correct_genotypes <- cloe_val_Z
correct_fractions <- cloe_val_F

Learn more

For more information please refer to the html vignette and to the R documentation of methods and classes.

Citation

Marass F, Mouliere F, Yuan K, Rosenfeld N, Markowetz F. 2016. A phylogenetic latent feature model for clonal deconvolution. The Annals of Applied Statistics. 10(4):2377-2404.

Contacts

Francesco Marass ( francesco.marass __ bsse.ethz.ch )