Reads filtered from chinput file?

Issue #25 resolved
Former user created an issue

Hi

How is the Chinput file generated?. I noticed that after running Hicup I have over 111 uniq million ditags for mESC_rep1 (mapped with bowtie1 to mm10) , so I expected that when I summed the number of reads in the chinput files I would get approximately 111 million counts, however when i sum the number of reads for each fragment I get approximately 63 million counts. Are reads thrown out at the Chinput generation stage? 0.001306% of my reads were filtered becuase of <60% overlap but have I lost a large number of other reads somewhere and if so is there a way I can find out why they have been filtered?

Many thanks for your help.

Comments (2)

  1. Mikhail Spivakov

    Typically, the HiCUP-generated file would contain all detected interactions, whereas bam2chicago.sh only preserves the ones containing - at least on one end - the captured (or 'baited') fragments. We typically get ~75% reads mapping to interactions with the baits, and while the ~55% rate you're observing isn't perfect, it's certainly not unthinkable. It does suggest however that the capture efficiency in the actual experiment has some room for improvement.

  2. Log in to comment