By Mathieu Bourgey; Ph.D
SCoNEs stands for Somatic Copy Number Estimator
SCoNEs is a tools dedicated to estimate the Copy Number Variation in whole genome sequencing paired cancer data using a read depth depth approach
SCoNEs general idea rely on the fact in NGS cancer data each cancer and each individual are unique. Thus each analysis is specific and the same detection thresholds and parameters can not be use as general settings to a whole set of samples.
SCoNEs rational is the logR ratio between normal and tumor read depth is asignal which is composed of mixture of several gaussian for each copy number state and somehow for technical and artifical noise. SCoNEs aims to deciphire the general LRR signal in set of gaussian mixture using the mclust approach. The 2-copies gaussian is then use to estimate the detection parameters and these one are used to estimate the copy number state using the DNAcopy approach.
Another particularity of the cancer is wide variety of copy number state that could be observed between the different sample. Thus one sample could contain no copy number variation whereas others can contain several copy number state. The number of copy number state as well as the technical variability could impact the total number of guassian that are mixed in the general system. As this number could not be estimated a priori, SCoNEs will estimate the sample parameter for 8 different models of gaussian mixture ranging from 1-7 gaussian and using a naive outlier detection approach (Hampell). The hampell method is sometime more efficient when to much noise is present in the signal which in that case tend the mclust methods to fail. So at the end SCoNEs will provide for each of the 8 models: a set of calls and graphical representation of the genomics ratio alongs with the estimated detection thresholds. The choice of the best models will need to be manually evaluated.
The SCoNEs workflows is:
1. Import binned data
2. Mappability correction
3. %GC correction
5. Generate log ratios
6. Baysian mixture model analysis
7. Detection parameters estimation
8. Copy number call for Tumor vs. Normal, Tumor vs. Tumor mean_coverage and Normal vs. Normal mean_coverage
9. Output genomics ratio and copy number calls
SCoNEs could also been aplied to non-cancer data (unpaired data) but the performance will be decreased.
To install SCoNEs just clone this repository:
git clone email@example.com:mugqic/scones.git
then the SCoNEs R script will be located in the scones folder whereas the compagnon script for filtering and annotation will be located in the scripts folder
SCoNEs contains mainy options that could be used:
USAGE : DNACRD.2.0.1.R [option] <Value> -f : binned read depth count file -o : output file -c : GC content and mappability file -n : 2 copies genomic proportion 0 - 1 (default 0.7 -r : remove the extrem percentile distribution for Gaussian modelisation 0 - 1 (default 0.05 => use ]0.025 ; 0.975[ -g : apply GC correction (0: true 1: False ; default true) -m : apply mappability correction (0: true 1: False ; default true) -s : apply ratio smoothing step (0: true 1: False ; default true) -b : bin size default 30kb -d : minimum consecutive bin support default 5 -a : approach (0: Individual 1: Somatic 2: Somatic+Germiline ; default Somatic+Germiline) -t : threads (default 10) -z : mappability bin thershold (default 1 - 1% max -h : this help
Generate CNV graphs and calls