Failure to run large RepeatExplorer run 1TB RAM
Issue #75
new
Hi all,
I am trying to run a RepeatExplorer comparative analysis for a 48 species that each have a genome size of roughly 6GB. I subsampled reads to 0.01x coverage (200,000 reads per sample) and ran RepeatExplorer locally on a 1TB RAM stand-alone server. However, when it got to the remove duplicates from the all to all blast, the run failed.
2022-07-23 22:02:37,898 - lib.seqtools - INFO -
removing duplicates from all to all blast results
Shutting down Rserv...Done
Traceback (most recent call last):
File "/usr/local/bin/seqclust", line 816, in <module>
main()
File "/usr/local/bin/seqclust", line 680, in main
hitsort = graphtools.Graph(filename=paths.hitsort_db,
File "/opt/repex_tarean/lib/graphtools.py", line 154, in __init__
self._read_from_hitsort()
File "/opt/repex_tarean/lib/graphtools.py", line 211, in _read_from_hitsort
self.conn.executemany(
sqlite3.OperationalError: database or disk is full
connection to Rserve refused, server is probably already down
My understanding is that this is possibly a restriction on max file/page size in SQL. Local storage is not a problem (the run approached 1.2TB of storage after all to all blast; there is 300TB in space on the machine) so I it may be an operational constraint in RepeatExplorer?
All the best, Mason
Hi all,
I am running into the same issue. Neither RAM nor local storage is the cause.
I also tried setting TEMP directory as suggested in issue #70 but that didn’t help.
Did anyone manage to solve the problem?
Thanks in advance,
All the best,
Camille