post-start-complete-wait

Marc Snir (snir@watson.ibm.com)
Thu, 20 Mar 1997 17:38:02 -0400

We seem to be rehashing the same stuff endlessly. Not because some new
fact comes to light, but because we forget old discussions.

The text is fairly clear about progress of put/get, and the discussion we
had about this months ago was also fairly clear: in the general case,
put/get requires the same kind of progress engine as send/receive. Namely,
when a process is blocked in an MPI call, it needs to continue progressing
on any (other) enabled communication. So

PROCESS 0 PROCESS 1

post post

start start

put put

complete complete

wait wait

compare this to

PROCESS 0 PROCESS 1

send(...,1,...) irecv(...,1,...,req)

send(...,2,...)

recv(...,2,...) wait(req1)

The put send a gygabyte of data, and we do not have shared memory, so we
use a rendez-vous protocol.
A request-to-put goes out when the put is executed. We get to the complete
call. The MPI process blocks. While blocked, it keeps polling for
communications that are enabled/matched, and keeps moving data on the wire.
The implementation has to suck bytes from the wire from the remote put and
send data down the wire for the local put while the process is blocked in
the complete call, thus making progress on both put calls. Eventually, all
data has been sucked, and the completes can execute. A complete-message
goes to the other end, to be matched by the wait. No reason to buffer --
just the usual policy of moving bytes when there is nothing else useful to
do.

In the second example, a rendez-vous protocol is used for the send of
process 0. If this process was late, then process 1 may already be blocked
in the send call at this point. While blocked in its send call, process 1
sucks data from the wire from the send of process 0, for its irecv.
Eventually all data is sucked, so that the send of process 0 completes, and
the recv is reached.

I don't know if this is "heavy duty", this certainly does not require
agents or interrupts, or timer kicks, more than send/recv already require
those.

Now, I don't think that hacking a slightly different proposal will change
this basic fact of life in MPI: namely, that when a process is blocked in
an MPI call, it has to work toward the completion of any enabled
communication. Even if complete is "collective with wait" (not sure what
this means), or even if complete and wait are merged, now one can come with
an example where, while blocked in a complete, the process has to suck data
on behalf of a send-receive pair. E.g.,

PROCESS 0 PROCESS 1

start/post(win) start/post(win)

put put

send(1GB) Irecv(the 1GB, req))
wait/complete(win) wait/complete(win)
wait(req)

while process 1 is blocked in the wait/complete pair (the complete call in
Lyndon's proposal) it has to move the 1GB of the send, so as to allow
process 0 to reach its wait/complete call.

To sum up, the current design of post/start/complete/wait does not intoduce
any new complexities in MPI implementation that are not already there.
Even if we delete this construct altogether, we still have the problem that
while blocked on an RMA construct we should be willing to make progress on
send-receive calls, and vice-versa. But this is "life as usual" in MPI
world.