Intersecting interactions from replicates

First of all thanks very much for CHiCAGO, it's really great! However, I'm currently using it to analyse capture-HiC data and I wondered if I could ask you and your group for some advice as I have a few question regarding how the software is working.

I have noticed that if I run two biological replicates (e.g. the mESC replicate 1 and replicate2 from the schoenfelder 2015 paper, which you use in your paper detailing ChICAGO) separately through CHiCAGO and then try and find the intersect of the interacting regions from the two lists of significant interactions I get very low overlap between the two lists for example:

mESC_rep1 = 60, 894 significant interactions (score >5) mESC_rep2 = 78, 029 significant interactions (score >5) number of regions found as significantly interacting in both mESC_rep1 & mESC_rep2 = 21750

Apologies if I missed it in your paper, is this the kind of overlap you would expect for two biological replicates or might I have gone wrong in my analysis somewhere? Is there a guide for a 'good' correlation between chicago interactions? I was hoping I could use the intersect as the most robustly interacting regions, but would i be better off using the score as an indicator of robustness?

Many thanks for your time, Charlie

  1. Mikhail Spivakov

    This kind of overlap is expected and results from the fact that CHiC data are undersampled. See, for example Figure S4 in the Cairns et al. paper. In the Discussion, we suggest some ways of handling this situation (basically, clustering/PCA based on the actual scores and tools such as sdef, depending on the purpose), but I would certainly not recommend simply comparing the overlap of thresholded interactions.

