Re: possible race condition (functional bug)

Rolf Rabenseifner (Rabenseifner@RUS.Uni-Stuttgart.DE)
Mon, 2 Sep 1996 12:09:26 +0200 (DST)

Marc,

-- first answer -- the functional bug in your synchronization modell --
==============

I'm afraid that you do not see the problem, because you compare my
example with your ideas and not with the text you have written.
Therefore again:

Origin 1 Target Origin 2

1 COMPLETE
2 WAIT
3 send to 1
4 recv
5 POST
6 start(MPI_WEAK or STRONG)
7 put
8 complete
9 wait
10 send to 2
11 receive
12 START(MPI_WEAK or STRONG)
13 put
14 load
15 post
... ...


>> Your text in Draft 08/26/96:
>>
>> MPI_START:
>> The succeeding RMA accesses to this window will be delayed,
>> if necessary, until the target window is available.
>> The target window is available if a call to MPI_RMA_POST
>> has occurred at the target, subsequent to the wait that
>> matched the previous complete call at the origin, for the same
>> window (if there is such a complete call).
>> MPI_WAIT:
>> The call also marks the target as not posted.

In MPI_START, there is no sentence about "marking the target as
posted".

Your rule is saying:

The succeeding RMA accesses (put, line 13) to this window will
be delayed, if necessary, until the target window is available.
The target window is available if a call to MPI_RMA_POST has
occurred at the target (line 5), subsequent to the wait (line 2)
that matched the previous complete (line 1) call at the origin,
for the same window (if there is such a complete call).

I do not see, that with your text the START has to delay the put
after the post on line 15 !!!!!!!

You wrote:
> I don't see how the bug could happen. The receive at origin 2 completes after
> the wait at Target. Thus, when process 2 executes its start, the target
> window is not posted, unless Target process executed its second post.

Here is the problem, you see. Your draft does not know about "posted".

> In any
> case, the put of process 2 can proceed only after the second post call of the
> Target process. In case this is not clear from the text in the current draft,
> than the text need be clarified.

Yes therefore I wrote my mail. Your draft does not say what you want
to say.

> But, an RMA call following a start can access
> a window only when the window is posted. In some cases, the RMA accesses will
> be further delayed until a subsequent post occur, but in no case can the access
> proceed while the window is not posted.

I see, you want the solution 'I' as written in my mail, see below.

But please look at my proposal G, I think you win nothing with your
proposal but you loose something, because it is not posting like
sending a flag, it is only a sort of locking and locking is not
enough for serializing the issue of calls at target and origin.
Therefore you needed such ugly rules and the ugly send/recv.
The rules in G are more simple and the send/recv is not necassary
with G.

The relevance of this discussion comes from e.g. "load balancing".


>> Post/Start must solve two major problems:
>>
>> (A) loops with several post/start/RMA/complete/wait/local_access
>> cycles must be possible
>> (B) in a scenary with changing communication partners it must be
>> possible to start such cycles after a initiating message
>> from the target to the origin RMA process.
>>
>> Draft 08/16/96:
>>
>> MPI_START:
>> The succeeding RMA accesses to this window will be delayed,
>> if necessary, until the target window is posted by a call to
>> MPI_RMA_POST.
>> MPI_WAIT:
>> ----
>>
>> Criticism: 1) In MPI_WAIT the following sentence must be added:
>> "The call also marks the target as not posted."
>> Then (B) is solved.
>> 2) Does not solve (A)
>>
>> Draft 08/26/96:
>>
>> MPI_START:
>> The succeeding RMA accesses to this window will be delayed,
>> if necessary, until the target window is available.
>> The target window is available if a call to MPI_RMA_POST
>> has occurred at the target, subsequent to the wait that
>> matched the previous complete call at the origin, for the same
>> window (if there is such a complete call).
>> MPI_WAIT:
>> The call also marks the target as not posted.
>>
>> Criticism: 1) MPI_START already defines exactly the meaning of
>> "window is available".
>> The sentence in MPI_WAIT makes therefore no sense.
>> 2) The definition in MPI_START solves (A)
>> but (B) is not solved, see example below.
>>
>> There are two possible solution:
>>
>> I) Combining the solutions in both drafts:
>>
>> MPI_START:
>> The succeeding RMA accesses to this window will be delayed,
>> if necessary, until the target window _is__marked__as__posted_
>> and, if the MPI_RMA_START is called after a MPI_RMA_COMPLETE
>> for the same window, until the target window _is__available_.
>> The target window _is__marked__as__posted_ by a call to
>> MPI_RMA_POST in the target process (MPI_RMA_INIT and MPI_RMA_WAIT
>> marks the window as not posted).
>> The target window _is__available_ when a MPI_RMA_POST is called
>> at the target process after the MPI_RMA_WAIT that matches the
>> previous MPI_RMA_COMPLETE at the origin.
>>
>> MPI_WAIT:
>> The call also marks the target as not posted.
>>
>> II) Saying explicitly that (B) is not solved and delete the sentence
>> in MPI_WAIT because now, it is a pure matching rule and not
>> a state-based rule.
>>
>>
>> Example that the Draft 08/26/96 does not solve (B):
>>
>>
>> Origin 1 Target Origin 2
>>
>> COMPLETE
>> WAIT
>> send to 1
>> recv
>> POST
>> start(MPI_WEAK or STRONG)
>> put
>> complete
>> wait
>> send to 2
>> receive
>> load
>> post
>> START(MPI_WEAK or STRONG)
>> put
>> ... ...
>>
>> This example does not work with the draft 08/26/96 because
>> the last start is satisfied by the first post. (Upper case is
>> used for the sequence matching the rule of MPI_RMA_START in
>> the draft 08/26/96.) I.e. the last load and put is running
>> at the same time!
>>
>> With solution II) we must say that after canging communication partners
>> it makes no sense to use MPI_RMA_START(MPI_STRONG) or
>> MPI_RMA_START(MPI_WEAK) because they normally match with previous
>> POSTs. The example can be solved by rewriting it in a manner that
>> MPI_RMA_START(MPI_NOCHECK) can be used, e.g.
>>
>> ... ... ...
>> wait
>> load <<<---!!!
>> send to 2
>> receive
>> post
>> START(MPI_NOCHECK)
>> put
>> ... ...
>>
>>
>> I also want to remember that this way of synchronization is not good
>> on virtual shared memory machines like CRAY T3E, because it cannot
>> be implemented efficiently there.
>>
>> And please remember that in this examples the ugly send-receive
>> synchronization with an empty message is only necessary, because
>> your post-start synchronization cannot solve application's need.
>>
>> Rolf


Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de