Welcome to Cloe
Cloe (pronounced like the name Chloë) is a computational biology tool to infer the clonal structure of heterogeneous tumour samples. It implements a phylogenetic latent feature model which discovers hierarchically related patterns (clonal genotypes) in the samples, and with these describes the observed mutation data.
Cloe has been developed with
R >= 3.2.1. It has been tested on Linux (Debian stable) and Mac OS X (10.8.5 and later).
To install its R dependencies, run the following line:
install.packages(c("devtools", "R6", "compiler", "digest", "ggplot2", "igraph", "RColorBrewer", "reshape2"))
If you don't mind, also get
Installing Cloe can be done directly from this repository:
If you have pandoc installed, you can also build the vignette:
Ready to go
If the above commands have run successfully, you will be ready to run Cloe. Please refer to the vignette for a tutorial on how to run Cloe. For a quick overview of Cloe's workflow, read on.
Running Cloe consists of four steps:
- Create an input object
- Run the sampler
- Get the best sets of parameters
- Select the model
Here is a brief example:
library(cloe) # 0. Load in the data reads <- as.matrix(read.table("reads.txt", header=TRUE, row.names=1)) depths <- as.matrix(read.table("depths.txt", header=TRUE, row.names=1)) # 1. Create an input object ci <- cloe_input$new(reads, depths) # 2. Run the sampler cm4 <- sampler(input=ci, method="cnn", K=4L, iterations=20000L) # 3. Get the best sets of parameters cs4 <- summarise(mcmc_object=cm4, burn=0.5, thin=4L, solutions=4L) # 4. Select the model # # css <- list(cs3, cs4, cs5) # top_cs <- select_model(l=css, solutions=6L, plot=TRUE)
In step 2, the sampler runs our MCMC(MC) algorithm using the number of clones
K that you specify. If you do not know how many clones are present in the data, you should run the sampler for several likely values, and select "the best model" in step 4. By default Cloe runs 4 parallel tempered chains. You can change this behaviour by specifying how many chains you wish and their temperatures (e.g.
chains=2, temperatures=c(1, 0.9)).
summarise function of step 3 discards iterations at the beginning of the chain with the
burn option (it takes a proportion of the iterations, e.g.
burn=0.5 discards the first half of the chain), it can
thin the chain taking every i^th iteration with
thin=i, and it returns a number of solutions sorted by decreasing log-posterior probability.
Note: you can plot all of Cloe's classes, but note that plots are automatically written to disk. This behaviour may change in the future.
select_model returns a list of
cloe_summary objects sorted by decreasing log-posterior probability. The model selection's plots show how well each model fits the data (log-likelihood) and how model complexity affects the fit (log-posterior). For approximately equal log-posteriors choose the model with the better fit to the data.
From version 0.9.8, Cloe's validation dataset (mixtures of single-cell diluted cell lines) is available within Cloe's package.
library(cloe) # data reads <- cloe_val_reads depths <- cloe_val_depths # correct clonal structure correct_genotypes <- cloe_val_Z correct_fractions <- cloe_val_F
For more information please refer to the html vignette and to the R documentation of methods and classes.
Marass F, Mouliere F, Yuan K, Rosenfeld N, Markowetz F. 2016. A phylogenetic latent feature model for clonal deconvolution. The Annals of Applied Statistics. 10(4):2377-2404.
Francesco Marass ( francesco.marass __ bsse.ethz.ch )