there are two problems with Marc's proposal now:
1. efficient implementation of Rma_start(MPI_WEAK):
[Raja:]
> I may be missing something, but it seems to me that as written, the
> Rma_post() <--> Rma_start() synchronization has to always be a full
> bidirectional exchange (RPC-like), with start asking the target if
> the post flag is currently set (locally), and blocking if it's not
> yet set (and the target buffering all these requests until the next
> post, and then replying to them all).
>
> Q1: is this correct?
> Q2: if yes, is this what we want (as opposed to Rma_post(comm, rank)
> and making the use do multiple of them, thus keeping track of the
> synchronization arrows and keeping post/start single-directional)?
I think, your right, that Rma_start must _get_ the "is_posted" flag.
Rma_start(MPI_WEAK) can do this:
- in distributed memory by piggybacking with next PUT or GET
- in shared memory systems it is only a memory load
(due to the length of one bit it is always atomic)
- in virtual shared memory maschines it is a possibly a
problem !!!!!!!!!!!
It is a load over the network and waiting is prossibly done
by polling over a loop that always loads the flag over the
network.
Eric and Karl, what do you think about T3E, etc.
2. missing functionality for applications with changing communicating
process subsets:
[Marc:]
> a window is available to a "start" call if it is posted, but the same post
> cannot satisfy two starts at the same process. I.e., the
> post must come after
> the wait that matched any previous complete by the same caller. The
> implementation must enforce the interleaving, either using a hand-shake
> protocol, or using generation counters.
I think, now it is okay for two cases, but not for a third, but it is
not clear whether we want to address this third case:
- case 1: 2-party synchronization
- case 2: switching the put-party from r.0 to 2 by a mes. from 1 to 2
- case 3: switching the put-party from r.0 to 2 by a mes. from 0 to 2
^^^
case 1 case 2 case 3
rank_0 rank_1 rank_0 rank_1 rank_2 rank_0 rank_1 rank_2
post post post
start start start
put put put
complete complete complete
start wait send- wait
: wait send--> \------->
: load load recv load recv
: post start start
return post : post :
put return return
complete put put
wait complete complete
load wait wait
load load
This third case need not be synchronized in that way, i.e. the
"start" in rank_2 can also match with the first "post" in rank_1:
rank_0 rank_1 rank_2
post
start
put
complete
send--------->recv
start
wait put
load complete
post
wait
load
This third case is handled correctly by the alternative proposal
in which "post" has a rank argument and start has a count argument
and the usual matching rule.
Rolf
Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de