The version of chapter four in the document that Steve will assmble tomorrow
is identical to the version circulated 2 weeks ago, with the following
1. a few corrections pointed out by Rolf and others
2. a variant of the counter proposal
3. an additional section on register optimization problems, as they apply to
MPI1, as well as MPI2.
The main issues to be debated and/or reopened (beyond the general uneasiness
with new functionality) are:
1. The decision to allow windows in regular memory. This forces some shared
memory implementations to add additional code to support the (for them)
inneficient case of windows in private memory. It does not seem to slow down
put/get in windows that use dynamically allocated memory (except the slow down
of one additional test, and the usual issue of code blow-up). It allows
distributed memory implementations and some shared memory implementations to
avoid a restriction that makes no sense for them.
2. The decision to allow put/get buffers that are not word/cache line aligned.
The design of put/get has been modified in order to alleviate the problem this
rpresents on systems that do have alignment restrictions on their DMA engines
or on thier load/store operations. Specifically, overlaping RMA operations
are disallowed. As a result, word (resp cache line) aligned put/get transfers
can go full blast, with no locking, while puts that are not aligned require a
lock/read/modify/write/unlock sequence, and gets require a lock/read/unlock
sequence (in a shared memory like implementation).
3. The decision to use counters to indicate the completion of put/get at the
target. The current proposal has tow variants. There also have been
sugestions to allow only 0/1 increments, or to drop altogether counters.
Since this section has not been voted formally, all options are open for a
4. The Window_in, window_out functions, which have not yet been voted in, at
IBM T.J. Watson Research Center
P.O. Box 218, Yorktown Heights, NY 10598