>
> Marc,
>
> I have some comments and questions about your 4/8/96 draft of the 1-sided
> chapter. I'm not sure if some of these questions arise because I have
> missed some prior discussion. I will probably send a similar note to the
> 1-sided mailing list, hopefully after getting your comments.
>
> o What is the rationale behind having both an "origin_count" and
> a "target_count" argument in the MPI_PUT/GET/IPUT/IGET functions?
> Is it to allow passing of raw binary data between two different-sized
> types? Is it to allow passing REAL data to a remote COMPLEX array?
> In this case we could imagine making sense of the transfer, but
> this usage seems too obscure to justify an additional
> "target_count" argument.
The count arguments have the same role as the count arguments in a
send/receive pairs: they specify how many elements are moved.
>
> o I'm told that MPI_FENCE is no longer collective. The wording in
> the document confused me into thinking it is collective.
The current definition of FENCE will allow it to be either collective (no
target specified) or point-to-point (specific target specified), at the choice
of the user. We should further discuss this when we meet -- there are several
reasonable alternatives to consider
>
> o Are you aware that the progress rule constrains the implementation
> on systems like the CRAY T3D and the CRAY T90? These are
> shared memory systems without cache coherency. On CRAY T3D
> systems there is an option to activate optional cache coherency, but
> on CRAY T90 systems there is no such means.
>
> I believe some Convex, NEC, and Fujitsu systems are also not
> cache-coherent and are affected similarly.
>
> There are 2 options for non-coherent SMPS:
>
> 1) A signal handler must be invoked at the target
> task when MPI_PUT is called. The signal handler will
> execute an instruction to make the cache coherent (by
> invalidating it) after the data for the MPI_PUT is delivered.
>
> 2) Another alternative is to add an MPI_DELIVER function:
>
> CALL MPI_DELIVER(comm)
>
> Any task is required to call this function before directly
> accessing (with loads and stores) the data updated by
> an RMA operation. Note that tasks may access the memory
> using subsequent RMA operations without intervening calls
> to MPI_DELIVER.
>
> I think this approach would also provide a convenient
> implementation avenue for a networked implementation
> of 1-sided MPI. The MPI_DELIVER function might serve
> most (all?) of the purposes of an RMA agent, making the
> RMA agent unnecessary. On the other side of the coin,
> cache-coherent SMPs can simply make MPI_DELIVER a no-op.
>
> Note that in the SHMEM library, the SHMEM_USCFLUSH function
> works the way MPI_DELIVER would work. On cache-coherent
> systems it is a no-op, and on T3D and T90 systems it
> invalidates cache.
>
>
If RMA is implemented via a DMA or other remote access mechanism that is not
coherent with the local processor(s) cache, then we really need a cache
flush/invalidate before the get/put starts. Otherwise, the get may get
invalid data and the put may be overwritten by lines that are written back. I
don't understand how a deliver call after thehRMA transfer suffices. The sense
of the forum was that MPI should not got out of its way to accomodate such
hardware. If the current design stays, it means that windows must be
uncached, or, to the least, must be flushed back to memory at any potential
synchronization point -- quite umpleasnt, I know. If we want to accomodate
both noncoheren tremote DMA, and local cached access to a window, then we need
to alternate the state of the window between noncached and cached. We need two
calls: MPI_WINDOW_EXPOSE, and MPI_WINDOW_HIDE. RMA access should occur only
whne the window is exposed, and local accesses should occur only whne the
window is hidden. (or, may be, remote accesses are delayed until the winodw
is exposed).
> Please comment.
>
> Regards,
>
> Karl Feind
>
> +-----------------------------------+----------------------------------+
> | Karl Feind | E-Mail: kaf@cray.com |
> | Cray Research, Inc. | Phone: 612/683-5673 |
> | 655F Lone Oak Drive | Fax: 612/683-5276 |
> | Eagan, MN 55121 | |
> +-----------------------------------+----------------------------------+
Marc Snir
IBM T.J. Watson Research Center
P.O. Box 218, Yorktown Heights, NY 10598
email: snir@watson.ibm.com
phone: 914-945-3204
fax: 914-945-4425
URL: http://www.research.ibm.com/people/s/snir