Wiki

Clone wiki

purge_haplotigs / Updates

04NOV2021 | v1.1.2

  • Misc minor bugfixes
  • Version is printed to terminal
  • Updated README

20FEB2020 | v1.1.1

  • Mostly misc bugfixes and performance tweaks
  • Pipeline now uses samtools depth for coverage calculation which is faster and hopefully more reliable at higher thread usage
  • Benchmarks have been updated given the improved runtime of step 1

08JUN2019 (ish) | v1.0.0

  • Various bugfixes, refactoring and optimisations
  • New features
    • Shorter subcommands (but backwards compatible) purge_haplotigs readhist is now purge_haplotigs hist, contigcov -> cov, ncbiplace -> place
    • Settable max coverage for read-depth histogram
    • Nicer looking histograms
    • Rename contigs using FALCON Unzip naming convention during purge and place scripts
    • New experimental script clip will find and trim contigs with overlapping edges (for use after purge)

03DEC2018 | v1.0.4

  • Minor update, various bugfixes and tweaked settings

29OCT2018 | v1.0.3

  • Hotfix for dotplots, reverse alignments were not being diplayed

25OCT2018 | v1.0.2

  • Significant improvements to pipeline performance
  • Reduced RAM usage and better thread usage potential for minimap2
  • Large reduction in IO operations
  • Overpurge-checking: all reassigned contigs are now checked after convergence to ensure they still meet the requirements for reassignment as haplotigs
  • Bugfix for readhist stage and hotfix for reapeat annotations

25SEP2018 | v1.0.1

  • Added to Bioconda

17SEP2018

  • Major update, the pipeline now uses Minimap2 in place of blast + lastz, this is orders of magnitude faster and performs similarly well.
  • installation available via anaconda.
  • readhist stage is now multi-threaded.
  • dotplots are now optional. Skipping dotplot generation is significantly faster.
  • More fixes for sporadic crashing during high thread turnover; I believe it is properly fixed now but will continue to monitor and test.

12JUN2018

  • Fix for sporadic crashing with high thread turnover
  • Added experimental features to branch 'dev'
    • -repeats: provide repeat annotations (in BED format) to use during analysis, purge.pl will ignore alignments over these regions when pairing contigs. This was included to address possible over-purging of highly repetitive contigs and appears to work well with repeatmodeller/repeatmasker annotations (but not windowmasker repeats).
    • -nucmer: use nucmer, delta-filter, show-coords instead of lastz. Slower but has an enriched dotplot to show repetitive alignments in red and a 1-1 chained alignment in black.
    • -wind_min, -wind_nmax: to replace -wind_len and wind_step. purge.pl will scale the size of BED windows to suite the length of the contigs to a minimum size of -wind_min and a maximum number of windows per contig of -wind_nmax. It will also convert the coverages to log2(read-depth/average read-depth).

25MAR2018

  • Added a new -windowmasker flag to purge_haplotigs purge and ncbiplace. This follows the guidelines HERE for creating blast databases with repetitive sequences masked. This results in much faster blastn hit searches in the initial stages of purge and ncbiplace with minimal impact on the final result.
  • Updated tests and test dataset
  • A number of other small tweaks and fixes (check commit comments)

Updated