Reduce multiprocessing overhead
- Remove the alive flag as per https://bitbucket.org/javh/presto/issue/6.
- Chunking may help (eg 1,000 sequences at a time).
Comments (3)
-
Account Deleted -
reporter Yeah, the way it’s setup is that total number of processes equals
nproc + 2
, wherenproc
sets the number of worker (computation) processes. The other two processes manage file I/O. One owns reading from the input file and putting data into the compute queue; one owns writing to the output files by collecting results from the queue storing the output of the computations. You should be able to basically ignore these 2 feeder/collector processes in your allocation as they aren’t doing much - they just exist to avoid worker processes mucking up file I/O.Setting
nproc
higher than 15-20 (depending upon the task) doesn’t really reduce runtime due to the overhead. I’d have to dig up the old scalability curves to be sure, but I think distributing load across jobs withnproc
set to ~8-12 would be quickest.Though, in this case, by “chunking” I meant changing how the data was fed into the compute queue within a single execution. Ie, loaded chunks of sequences from disk into memory instead of doing them one at a time to reduce the intraprocess communication.
Though, since then, I’ve realized that the biggest performance issue is the use of biopython Seq objects to store everything…
-
Account Deleted Ok, thanks for your reply! Indeed on our systems we also see no real benefit in using more than 8 cores. By the looks of it, it doesn’t seem too complicated to change how data is fed in (I might take a stab at that on my end to see if that makes it scale beyond 8 cores). Thanks!
- Log in to comment
I’m currently doing the chunking in a wrapper (i.e. using
parallel
orjoblib
), but it seems to need at least 3 processes, of which one is CPU bound. Do you have suggestions as to what would be a good ratio ofnproc
vs. number of parallel runs (e.g. of MaskPrimers and AssemblePairs)?