I may be missing something, but it seems to me that as written, the
Rma_post() <--> Rma_start() synchronization has to always be a full
bidirectional exchange (RPC-like), with start asking the target if
the post flag is currently set (locally), and blocking if it's not
yet set (and the target buffering all these requests until the next
post, and then replying to them all).
Q1: is this correct?
Q2: if yes, is this what we want (as opposed to Rma_post(comm, rank)
and making the use do multiple of them, thus keeping track of the
synchronization arrows and keeping post/start single-directional)?
--Raja