1) The example in section 9.1.1 (line 28 thru 36) is not a very good one. Due
to the lack of synchronization from process 1 to process 2, process 2 could
easily assign buff to ccc before the MPI_PUT causing exactly the same result
as the compiler optimization (i.e., ccc doesn't contain 777). It is neccessary
to force the assignment by processor 2 to ccc to occur after the PUT, either
using another send/recv, or some other synchronization mechanism.
2) Section 9.1.2, line 25. This statement is incorrect. Non-blocking send and
receive operations can also cause problems with cache incoherence. That is one
of the reasons why the standard explicitly disallows access to the send/recv
buffer during the non-blocking operation (e.g. page 40 lines 42 to 48). Our
implementation takes advantage of this to acheive "zero-copy" sends and
receives on our non-cache coherent machine.
3) Section 9.1.1, starting with "In C, subroutines..." I am not convinced that
the C programmer will never have problems with compiler optimization of
variables. It would seem to me that at the some compilers may make such
agressive optimizations even if they aren't correct under a strict
interpretation of the C language.
I don't think I agree exactly with your three cases. First we need to define
synchrnoization points to define the order of the access. Then we can start
discussing memory consistency.
Hughes Aircraft Co.,