Re: 5.6 Semantics and correctness

Marc Snir (snir@watson.ibm.com)
Wed, 5 Feb 1997 18:46:08 -0400

The following comments refer to section 5.6 Semantics and correctness:

1) Is a put an access, or ONLY an update? The definitions on page 107, line
29 thru 32 appear to imply that a put is NOT an access.

If the above is true, then the rules on page 108, do not appear to disallow
two concurrent puts to the same location in the same window. Is this
intended?

I assume what we really mean is that a load, store, get, put and accumulate
are all "accesses". And that store, put and accumulate are all "updates".

****
Added text to clarify
*****

2) We do not define when an update to local memory becomes "visible" in
local memory. I assume the implicit rule is that the the update is
"immediately" visible.

****
This really defined by the semantics of Fortran or MPI message passing. I
don't think I need to expand here
******

Similarly, we do not define when an update to a window copy becomes visible
in the window copy. I assume the implicit rule is that the update is
visible after the RMA operation is completed.
***
Added text to clarify
*****

Assumption a) above is required to understand rule 1 on page 108 if the
access and update are both local. Assumption b) above is required to
understand rule 2 on page 108 if the access and the update are both RMA.

3) The advice to users rules (page 109) are insufficient as they do not
address the problem of overlapping windows. A sufficient (but not
neccessary) rule would be to avoid overlapping windows.

4) I suggest we use the terms "public copy"/"private copy" instead of
"process memory"/"window copy" throughout the text. This abstracts the
whole definition away from the details of implementation. To be consistent
with the text in 5.6, figure 5.5 should be relabeled with "public
copy"/"private copy" instead of "window copy"/"process memory". Note: a
store may not update "process memory", since the machine may have a write
back cache. On such a machine the private copy is really a combination of
"process memory" and local cache.
****
process memory is used as a logical term, this is the set of variables
declared in the program -- and is disticnt from physical processor memory.
Any better suggestion, to say waht I mean?
*****

5) In rule 2 (page 108), I assume the conflicting update may either be in
this window, or another overlapping window. The access is precluded until
the update is visible in the window copy of the updating window (we have no
rule about when an update in an overlapping window becomes visible in this
window). Similarly in rule 3, I assume line 35 should read: "...until the
second update becomes visible in the (overlapping) window copy."
****
We have a rule when an update from one overlaping window becomes visible in
the other: this is derived from 5+6, by implication (modus ponens). I
added a paragraph to explain this. These derived rules tell when the
transfer is complete. It can start, as explained before, as soon as the
update of the first window is started.

Your reading of line 35 is wrong. I meant unil second update becomes
visible in the first window copy. I clarified this.
********

6) Rule 1 on page 107 (line 36) should read: "An RMA operation is completed
at the origin by the ensuing call...". The current text mentions "process
memory" which might be confused with the target private copy. What we
really mean is the origin caller memory.
****
OK
*****

7) Rules 1 thru 3 on page 108 should use the word "should not" instead of
"cannot". The program "can" do what ever it wants, but to avoid being
erroneous, it "should" obey certain rules. The use of the word "cannot"
might imply that the implementation blocks the conflicting call until the
rules are satisfied.
*****
OK
*******

8) Page 110, line 19-20: In MPI-1, the critical semantics defining
non-blocking calls is that they are local, i.e., they return without
waiting for the action of some other processor. The issue of on-going use
of the local buffer is a detail.

We should thus define post, complete, unlock, put, get and accumulate
similarly. Wait, complete, lock and barrier are clearly non-local, and
therefore blocking.

The rules for one-side are actually rather different than for
point-to-point. Consider the following:

MPI_Isend ()
MPI_Recv ()

No further MPI calls

The standard makes it clear that the receive will be completed even if the
send is never completed (non-blocking progress rule, page 44). I.e., the
enabled communication (the send) occurs.

One sided is different. An enabled put may never do anything useful until
the synchronizing operation (unlock, complete or barrier). It is entirely
legal to defer one sided operations to the complete. In contrast it is NOT
legal to defer a non-blocking send or recv until the wait.

It would appear that the progress rule for pt2pt is similar to the
post/start, complete/wait calls progress requirements. The progress issue
for put/get/complete is different. Since these are local operations, no
remote activity should be required. In particular, the following must
complete:

Post
No further MPI calls Start
Get
Complete

or
Post
No further MPI calls Start
100000 puts
Complete

Progress here is local only, it doesn't involve the remote processor (and
is therefore very different than the pt2pt progress requirements).

****
Added text to clarify
*****

****
Thnks for the detailed reading

Marc
*********