From: Rabenseifner@RUS.Uni-Stuttgart.DE (Rolf Rabenseifner)
>To Marc's WINDOW/LOCK/TRYLOCK/UNLOCK proposal:
>
>Looking at FE applications I believe that your proposal is worse
>than the current official draft:
>
>For applications that should not be executed in strongly synchronized
>steps (i.e. with MPI_BARRIER), I have compared
> - PUT & counters with the current official draft, and
> - PUT & LOCK/UNLOCK with the radical new draft.
Did you compare with my "Extended proposal: pt-2-pt 1-sided"? See
the brief description below, in case it slipped by the first time.
I will be posting a more complete description shortly.
>In the moment all alternative proposals are less efficient because
>they are blocking the application more than necessary
>(because MPI_BARRIER is the olnly way or because the put must
>be done in a locked manner).
This is not the case with my proposal.
Both MPI_PUT and MPI_GET are non-blocking. The user decides when to wait
for the PUTs or GETs to complete in the originating process and the target
(which might be the same process).
>I understand the vendors that it is more work to look at their
>cache modell and to look how MPI_WINDOW_IN/OUT must be implemented
>on their hardware than to implement only a less efficient but
>more simple modell -- but MPI-2 should allow efficient
>parallel programming.
The problems that I cited earlier with respect to MPI_WINDOW_IN/OUT
were not related to their efficiency or their understandability.
They were related to the lack of a complete and consistent definition.
Until these are corrected -- i.e. until someone can explain what is
being proposed -- I don't think that it is a proposal at all. If someone
can explain them, then I hope that they will write these explanations
down, so that it will become a proposal that I and others may vote upon.
**************
I believe that the answers to your questions are clearly spelled out in the
current text -- this is why I did not answer your previous message, as I
assumed you will find the answers by yourself, after more careful reading.
But since you still seem to have questions, I list below the answers.
******************
Some of the specific questions (from memory) are:
(1) Can a counter update in the target before the target executes a
WINDOW_IN or WINDOW_OUT? If so, does that mean that the counter in
the target can be updated before PUT data appears in the user's window?
Or do the counter routines automatically perform a WINDOW_IN or
WINDOW_OUT (and if so, which one)? If the counter cannot update until a
WINDOW_IN is performed, then is there a way for the target to know, using
a counter, when it is time to execute a WINDOW_IN without polling?
*************
The counter is updated when the put has completed at the target. Put has
completed at the target means, conceptually, that the data is in the "public
copy" of the window. The data is available to get accesses, but will become
available to the local process only after it executes a WINDOW_IN.
***************
(2) Is there a way for the target to know when the originator has completed
a FENCE, so that the target can know when to perform a WINDOW_IN (other
than to use a BARRIER between the FENCE on the originator and the
WINDOW_IN on the target)? If not, then is it useful?
********************
There is no direct way for the target to know when the originator has completed
a FENCE. For communication between originator and target, one uses counters or
barriers. The FENCE is needed in case where the originator wants to guarantee
that some access it has originated has completed before it originates another
access. E.g., that an update it originated has completed before it sets a flag
or releases a lock. In other words, FENCE is used to order successive RMA
accesses by an originator, not to synchronize with the target process.
**************************
(3) Can a FENCE even complete before the target executes a WINDOW_IN or
WINDOW_OUT? If so, then what does it mean for a FENCE to complete?
*********************
Yes, the FENCE can complete before the target executed WINDOW_IN or
WINDOW_OUT. Again it means, conceptually, that the data is in the "public
window copy". It is visible to other gets (including gets by the target
process), but becomes visible to loads only after a call to WINDOW_IN.
************************
As for Marc's new proposal -- I haven't looked at it closely enough yet. At
first, it appears that the intention (as is usual for locks) is to combine
multiple PUTs or GETs into logically-atomic transactions, but this appears
not to be the case, based upon other words in the chapter and his response
to your question. It appears instead, that acquiring a lock is just a
process's way of informing the system whether it wants to read or write a
window so that the proper coherence operations can be performed.
I believe that my proposal offers a valuable alternative.
Basically, my proposal works as follows:
MPI_PUT and MPI_GET are always non-blocking.
MPI_PUT is matched by an MPI_ACCEPT on the target
MPI_GET is matched by an MPI_OFFER on the target
MPI_ACCEPT (and MPI_OFFER) take the number of MPI_PUTs (MPI_GETs) to
match as an argument.
MPI_ACCEPT and MPI_OFFER block until (1) they have matched with the
stated number of PUTs (or GETs) on other processes, and (2) all of
the PUTs (or GETs) preceding the ACCEPT or OFFER on the same processor
have matched (with ACCEPTs or OFFERs in their target processes).
Both MPI_ACCEPT and MPI_OFFER have non-blocking versions (MPI_IACCEPT
and MPI_IOFFER).
There are also collective versions of MPI_ACCEPT and MPI_OFFER (which
do not take a count argument for the number of matches). These are
essentially identical to Marc's proposal for BARRIERs, except that
"BARRIER" is replaced by either "ACCEPT_ALL" or "OFFER_ALL". This means
that a dirty-bit is not required.
For those cases where collective is not desired and the number of PUTs
or GETs is not known by the target, MPI_RMA_PROBE is available for a
process to determine if there is an outstanding PUT or GET waiting for
it (so that it can issue an ACCEPT or OFFER).
As I say, I will be posting a complete semantics soon -- I hope.
-Dave
****************
I hope you will complete soon a proposal detailed enough to discuss, including
a description of implementation on all the various systems we are targetting
****************
===============================================================================
David C. DiNucci | MRJ, Inc., Rsrch Scntst |USMail: NASA Ames Rsrch Ctr
dinucci@nas.nasa.gov| NAS (Num. Aerospace Sim.)| M/S T27A-2
(415)604-4430 | Parallel Tools Group | Moffett Field, CA 94035