Failure to run large RepeatExplorer run 1TB RAM

Hi all,

I am trying to run a RepeatExplorer comparative analysis for a 48 species that each have a genome size of roughly 6GB. I subsampled reads to 0.01x coverage (200,000 reads per sample) and ran RepeatExplorer locally on a 1TB RAM stand-alone server. However, when it got to the remove duplicates from the all to all blast, the run failed.

2022-07-23 22:02:37,898 - lib.seqtools - INFO -
removing duplicates from all to all blast results

Shutting down Rserv...Done
Traceback (most recent call last):
  File "/usr/local/bin/seqclust", line 816, in <module>
    main()
  File "/usr/local/bin/seqclust", line 680, in main
    hitsort = graphtools.Graph(filename=paths.hitsort_db,
  File "/opt/repex_tarean/lib/graphtools.py", line 154, in __init__
    self._read_from_hitsort()
  File "/opt/repex_tarean/lib/graphtools.py", line 211, in _read_from_hitsort
    self.conn.executemany(
sqlite3.OperationalError: database or disk is full
connection to Rserve refused, server is probably already down

My understanding is that this is possibly a restriction on max file/page size in SQL. Local storage is not a problem (the run approached 1.2TB of storage after all to all blast; there is 300TB in space on the machine) so I it may be an operational constraint in RepeatExplorer?

All the best, Mason

Comments (1)