Issue #25
resolved
Hi
How is the Chinput file generated?. I noticed that after running Hicup I have over 111 uniq million ditags for mESC_rep1 (mapped with bowtie1 to mm10) , so I expected that when I summed the number of reads in the chinput files I would get approximately 111 million counts, however when i sum the number of reads for each fragment I get approximately 63 million counts. Are reads thrown out at the Chinput generation stage? 0.001306% of my reads were filtered becuase of <60% overlap but have I lost a large number of other reads somewhere and if so is there a way I can find out why they have been filtered?
Many thanks for your help.
Comments (2)
-
-
- changed status to resolved
- Log in to comment
Typically, the HiCUP-generated file would contain all detected interactions, whereas bam2chicago.sh only preserves the ones containing - at least on one end - the captured (or 'baited') fragments. We typically get ~75% reads mapping to interactions with the baits, and while the ~55% rate you're observing isn't perfect, it's certainly not unthinkable. It does suggest however that the capture efficiency in the actual experiment has some room for improvement.