-
assigned issue to
- edited description
- changed milestone to 2018.03.31 release
This issue was discussed in the 1/10/18 meeting. We decided to prototype the potential benefit by March and possibly prototype for September.
Based on some performance measurements I collected for issue #108:
source code here:
memcpy(4KB) 0.121670 us
self.lpc(noop0) 0.527286 us
upcxx::rput<double>(self) 0.557903 us
A memcpy of a full-page, 4 kilobyte payload is about 4 times faster than an 8-byte rput().wait()
or zero-payload loopback lpc().wait()
, due to avoiding the overheads of the UPC++ progress engine. This seems to support the value of the optimization proposed here when local_team loopback is the expected common case.
One interface idea presented was to spell the "opt-in" syntax using the same rput/rget calls and an extension to the generalized completion framework like operation_cx::as_maybeready_future()
and source_cx::as_maybeready_future()
(with a better name, possibly as_eager_future?). By only allowing it for future signalling this eliminates the possible need for "progress level: user" on the injection call.
There might still be motivation to allow the analogous behavior for synchronous LPC notifications, eg rput(val, gptr, operation_cx::as_maybesynchronous_lpc( default_persona(), func ))
, which would allow synchronous execution of func before returning from the rput call when the data movement is performed synchronously (and because the LPC is self-targetted).
Current working group draft now appears in pull request #41