However, the ideas that he presents as possible solutions are "dangerous" :-)
since they restrict and complicate the elegant thus far RMA semantics to the
degree that makes the RMA model unusable for some apps. I think that the cost
of providing the fully portable RMA interface to the non-cache-coherent hardware
like T90 is too high. Karl could always provide Cray-specific function cri_deliver
required on his non-cache-coherent hardware to access the window locally.
> a) Disallow concurrent local and remote RMA access to a window,
> thus requiring use of MPI_GET when spin-waiting.
> or b) Disallow concurrent local and remote RMA access to a window, with
> one exception. Local loads (but not stores) of the window are OK at
> any time, but MPI_DELIVER must be called to ensure that any
> MPI_PUTs delivered since the last sync point are visible to local
> loads on the target.
> or c) Changed the hide/expose routines' semantics. The hide/expose
> operation takes effect at the next synchronization point.
> This option is really the same as "a".
Jarek