Unexpected scaling in put_flood benchmark

Issue #322 new
Paul Hargrove created an issue

Short version:

On a least one conduit, upcxx-run -n2 put_flood runs for 55 minutes!

Long version:

Currently the "meat" of put_flood.cpp is a loop which injects communications non-stop for 0.5s (by default). For the gasnet-nbi case the (experimental) ucx-conduit is capable of enqueuing over 600,000 operations in that time interval independent of size (without attempting to internally complete any of them). At least ibv and aries conduits bound their resource usage such that injection would begin draining in-flight operations long before 600,000 could be incomplete at the same time. For that reason, this test has not seen any problem with the 0.5s injection burst prior to this.

The net result is that each time the size doubles, so does the time spent in the Finish.
On Dirac this equates 55 minutes to complete the test with defaults.

This unbounded-buffering behavior is one I plan to report to the conduit author as being undesirable. However, I cannot claim that it is an illegal behavior. So, it would be preferable for this test to make allowances for conduits with this sort of behavior.

This is NOT currently a high-priority issue, and I have not assigned it to any milestone.

Comments (0)

  1. Log in to comment