Reduce multiprocessing overhead

Former user Account Deleted

I’m currently doing the chunking in a wrapper (i.e. using parallel or joblib), but it seems to need at least 3 processes, of which one is CPU bound. Do you have suggestions as to what would be a good ratio of nproc vs. number of parallel runs (e.g. of MaskPrimers and AssemblePairs)?

2020-12-07T19:08:10+00:00

Jason Vander Heiden reporter

Yeah, the way it’s setup is that total number of processes equals nproc + 2, where nproc sets the number of worker (computation) processes. The other two processes manage file I/O. One owns reading from the input file and putting data into the compute queue; one owns writing to the output files by collecting results from the queue storing the output of the computations. You should be able to basically ignore these 2 feeder/collector processes in your allocation as they aren’t doing much - they just exist to avoid worker processes mucking up file I/O.

Setting nproc higher than 15-20 (depending upon the task) doesn’t really reduce runtime due to the overhead. I’d have to dig up the old scalability curves to be sure, but I think distributing load across jobs with nproc set to ~8-12 would be quickest.

Though, in this case, by “chunking” I meant changing how the data was fed into the compute queue within a single execution. Ie, loaded chunks of sequences from disk into memory instead of doing them one at a time to reduce the intraprocess communication.

Though, since then, I’ve realized that the biggest performance issue is the use of biopython Seq objects to store everything…

2020-12-07T19:23:27+00:00

Former user Account Deleted

Ok, thanks for your reply! Indeed on our systems we also see no real benefit in using more than 8 cores. By the looks of it, it doesn’t seem too complicated to change how data is fed in (I might take a stab at that on my end to see if that makes it scale beyond 8 cores). Thanks!

2020-12-07T19:37:57+00:00

Comments (3)