At what point in time is there a guarantee that the data has been put in
the target memory?
The text says that the put is completed in target memory when a subsequent
call to MPI_LOCK occurs. What does this mean exactly? Two examples.
process 0 process 1
put A <-1
send -----------> recv
Must the get of A return 1? If so, we must make sure that, to the least, a
lock was acquired before the call to unlock returned at process 0. We can
think of an aggressive implementation where one only make sure that the
lock is acquired before unlock returns, but where the put may proceed in
the background. But, from the user view-point this has the same effect as
if the put completed before the call to unlock returned (assuming all
accesses to the window are lock protected).
process 0 process 1
put A <- 1 get B
put B <- 1 get A
Must get A return 1, if get B returned 1? This are the normal
serialization semantics: if get B is "after" put B, then clearly get A must
be "after" put A. But, then, locks must be acquired in the right order.
First, let me make clear that the problem arises only for lock/unlock.
When barrier or start/post/wait/complete are used, there is an operation
issued by the target process that completes the RMA access, so that there
is no ambiguity about semantics. The lock/unlock "shared memory like"
communication paradigm requires further elaboration.
Any shared memory model provies a mechanism for a process that stores data
in shared memory to "know" that the data has been stored. If strong,
serialization semantics are used, then the store in memory can be assumed
to be coincidental with the issuing of the store operation: the outcome is
as if the store accurred atomically (however, the exact ordering of stores
issued by different processes may be nondeterministic). When weak
semantics are used then the two may not be coincidental; shared memory is
updated sometime later. But, in such a case, mechanisms are provided that
allow the process to assertain that the store went to shared memory. For
example, with release consistency, the release of a lock will indicate that
the store has gone to shared memory. In general a synch or commit operation
is provided to make sure that the store has gone to shared memory.
Our current design of lock/unlock does not provide any such mechanism. I
believe that this is wrong, and makes the current lock/unlock design of
little use. Therefore, I propose to enforce clearer and stronger semnatics
for lock/unlock. Namely,
a put is completed in target memory when the ensuing MPI_UNLOCK call
returns at the origin process.
This will disallow some possible optimizations. E.g., in a sequence
one cannot gather both puts and execute them with one interaction with the
target process. Also, this means that an acknowledgement may be needed
back from the target process before the computation may preceed beyond
MPI_UNLOCK. On the other hand, the semantics are clear and unsurprising.
Shared memory implementations do not suffer, and distributed memory
implementations get what they deserve, trying to simulate shared memory on
top of message passing.
With this interpretation, in Example 1, get A at process 1 must return the
value put by process 0 (if no other put occurs meantimes). In Example 2
serialization semantics apply. If the get B returned 1, then get B
occurred (at the target window) after put B, whcih occurred after the first
unlock of process 0, which occurred after put A completed at the taget
window. So get A will also return 1.
Unless I hear strong disagreement, I shall modify the text to reflect these