Hi, thank you for creating a program that is very easy to install and use!
I have a question about interpreting the coverage histogram before and after executing purge_haplotigs
. As shown below, the post-purging coverage histogram is exactly the same curve as the pre-purging histogram, just with much fewer contigs, because a few thousand were identified as junk in previous steps. I have tried multiple cutoffs including -l 5 -m 25 -h 120
and -l 5 -m 45 -h 500
and both sets of parameters did not change the shape of the histogram post-purging. The curated.fasta had the same number of BUSCOs as the original assembly but three fewer duplicated BUSCOs. Any advice would be great thanks!
Pre-purging:
Post-purging:
Commands used:
minimap2 -ax map-pb -H --secondary=no assembly.fasta pacbio_reads.fq -o assembly.sam -t 16
samtools view --threads 16 -b assembly.sam > assembly.bam
samtools sort --threads 16 assembly.bam -o assembly.sorted.bam
purge_haplotigs hist -b assembly.sorted.bam -g assembly.fasta -t 16
purge_haplotigs cov -i assembly.sorted.bam.gencov -l 5 -m 25 -h 120
purge_haplotigs purge -g assembly.fasta -c coverage_stats.csv -t 16
It looks like the coverage peaks are overlapping a bit; you want to capture all the 1/2 coverage contigs. try:
-l 5 -m 60 -h 190
for the cov step, then-a 60
for the purge step. What sort of BUSCO duplication do you have?