Interpreting coverage histogram pre- and post-purging

Hi, thank you for creating a program that is very easy to install and use!

I have a question about interpreting the coverage histogram before and after executing purge_haplotigs. As shown below, the post-purging coverage histogram is exactly the same curve as the pre-purging histogram, just with much fewer contigs, because a few thousand were identified as junk in previous steps. I have tried multiple cutoffs including -l 5 -m 25 -h 120 and -l 5 -m 45 -h 500 and both sets of parameters did not change the shape of the histogram post-purging. The curated.fasta had the same number of BUSCOs as the original assembly but three fewer duplicated BUSCOs. Any advice would be great thanks!

Pre-purging:

Post-purging:

Commands used:
minimap2 -ax map-pb -H --secondary=no assembly.fasta pacbio_reads.fq -o assembly.sam -t 16

samtools view --threads 16 -b assembly.sam > assembly.bam

samtools sort --threads 16 assembly.bam -o assembly.sorted.bam

purge_haplotigs hist -b assembly.sorted.bam -g assembly.fasta -t 16

purge_haplotigs cov -i assembly.sorted.bam.gencov -l 5 -m 25 -h 120

purge_haplotigs purge -g assembly.fasta -c coverage_stats.csv -t 16

Comments (4)