Unexpected scaling in put_flood benchmark
Short version:
On a least one conduit, upcxx-run -n2 put_flood
runs for 55 minutes!
Long version:
Currently the "meat" of put_flood.cpp
is a loop which injects communications non-stop for 0.5s (by default). For the gasnet-nbi
case the (experimental) ucx-conduit
is capable of enqueuing over 600,000 operations in that time interval independent of size (without attempting to internally complete any of them). At least ibv and aries conduits bound their resource usage such that injection would begin draining in-flight operations long before 600,000 could be incomplete at the same time. For that reason, this test has not seen any problem with the 0.5s injection burst prior to this.
The net result is that each time the size doubles, so does the time spent in the Finish
.
On Dirac this equates 55 minutes to complete the test with defaults.
This unbounded-buffering behavior is one I plan to report to the conduit author as being undesirable. However, I cannot claim that it is an illegal behavior. So, it would be preferable for this test to make allowances for conduits with this sort of behavior.
This is NOT currently a high-priority issue, and I have not assigned it to any milestone.