As an implementor, I share Mark's concern that we may be overlooking some
significant consequences here. For brevity, I shall use the following notation:
PUT means MPI_PUT()
POST means MPI_WIN_POST()
WAIT means MPI_WIN_WAIT()
START means MPI_WIN_START()
COMPLETE means MPI_WIN_COMPLETE()
The MPI Forum has mercifully avoided the temptation to mandate support for
MPI_ANY_SOURCE in POST. Therefore, since implementations will have perfect
knowledge of all remote participants when POST is called, they can in principle
just send messages to all of them immediately and return. And since START *is*
allowed to block, an implementation can simply sit there until it receives all
of the required "I've posted" messages from the remote group.
So, it seems reasonable to assert that in the above example, we can always
remain deadlock-free to the point where both 0 and 1 return from START.
Now we get to COMPLETE, and here's where I start to worry. It is clear that
WAIT is allowed to block (it basically must block, actually) until all of the
matching calls to COMPLETE have happened. But what about the reverse? Is
COMPLETE allowed to block until the matching WAIT has happened? If so, then the
above is clearly a deadlock situation. And this concerns me a great deal,
because I had always thought that COMPLETE was collective with WAIT. That was
certainly the design at one point in the past, and I know that I would have
fought hard against any suggestion to the contrary. But looking at the current
text, I see nothing which would allow this.
What are the consequences of mandating that COMPLETE never block? Consider:
process 0 wants to PUT a mondo-gram to process 1 in the above example. Since
the data is large, we don't want to buffer it; we instead want to use a
rendezvous protocol for high bandwidth. So 0 instead sends a short 'Please tell
me when you are ready' message to 1. Now...when 0 hits the COMPLETE, it must
not return until it is safe to for the application to alter the data in the put
buffer, meaning that all of the original data must either be delivered or
buffered somewhere. But since COMPLETE is not allowed to block, what are we to
do? We have no choice but to buffer the stupid message anyway and return!
I find this to be a very Bad Thing, and it goes very much counter to the
original philosophy behind seperating the gets and puts from the coherence
calls, which was intended to *not* impose heavy-duty progress agents on
implementations. Now it looks like we need one anyway. IMHO, this must be
fixed. So I assert that if we are serious about keeping this section, we must
allow COMPLETE to block and update the text accordingly.
> No, the progress rule in Section 5.6.2 says on page 112
> very clear that the example should not deadlock, otherwise
> the implementation is wrong.
Well, it's one thing for the standard to mandate something, but we also need to
convince ourselves that such things are possible! This is the danger in
standardizing something before we have prototypes, which is why some of us
tried to have this particularly dangerous section of the chapter moved into the
JOD at the last meeting.
(My spider-sense tells me that Lyndon is about to whip out his existence proof.
To which I shall pre-emptively reply: how well would your design extend to a
cluster of machines, with no shared memory? Can you avoid deadlock even then
and still be efficient? If so, please tell me how and I'll be happy.)
-- Eric Salo Silicon Graphics Inc. salo@sgi.com