Small capture library

Issue #53 new
Kirsty Jamieson created an issue

Hi,

I have a relatively small capture HiC library (DpnII digest) covering almost 3000 DpnII fragments or about 1 Mb. From 155 M reads using minfraglength 40, maxfraglength 930 and default bin size, Chicago calls 36K interactions. This value seems higher than expected. I’ve tested downsampling my raw reads and find that the number of interactions never reaches a plateau as I increase raw reads. Could you give me some advice on whether anyone has used Chicago for relatively small capture libraries and whether the small capture region affects the background model?

Thanks!

Comments (3)

  1. Mikhail Spivakov

    In general I'm not surprised as chic data are undersampled until you sequence a huge amount of reads, particularly with dpnii. Do your peaks look sensible? Do you refer to valid reads or total? Are your baited regions disparate or you've baited a small number of large regions?

  2. Kirsty Jamieson reporter

    I referred to total reads from the sequencer in the previous graph I attached. I’ve been using HiCUP to generate the .bam for Chicago. The graph below shows how as I increase the reads, I do see that the number of unique ditags (calculated after filtering for unique, paired reads that have been de-duplicated) approaches a plateau at just below 8M at 155 M reads. This is the number of final reads for the .bam that gets used as input for Chicago. This is a little troubling for me since I would expect that as the reads for the .bam approaches a plateau, the number of interactions called would also reach a plateau.

    I have a combination of baits located around TSS and covering both lead GWAS variants and variants in linkage. At the TSS, the interactions seem sensible, although maybe a little more than I would guess, but around the variants there look like many more peaks than I would have expected. Because I’m covering lead and LD variants, this subset of baited regions cover many DpnII fragments, I think my largest is around 20 fragments.

  3. Mikhail Spivakov

    I see what you mean. Perhaps it's worth looking into this deeper by checking what peaks are getting called at increased coverage despite the number of unique reads reaching saturation.

    For continuous runs of baited regions there's a risk that the background may be misestimated. I would try binning them (in both baitmap and rmap) and checking if the profiles change a lot.

  4. Log in to comment