(The examples can be found in my last mail.)
Now for me, his proposal is clear and I think one can make the
following abstaction, which simplifies the comparison:
E (Dave's proposal) can be derived from B (Marc's proposal by the
following mapping:
E's start of IOFFER with non-zero count --> B's POST
E's first start of PUT or GET after last OFFER --> B's START(WEAK)
E's completion of IOFFER after prev. local PUT/GET --> B's COMPLETE
E's completion of IOFFER with non-zero count --> B's WAIT
And E combines B's POST with a non collective RMA_INIT
and B's WAIT with the destruction of the window.
And E has "tag"s (A,B,C and D does not have the "tag" argument).
Therefore we can state, that E allows all necessary cache operations.
It never induces unnecessary cache operations because they can
eliminated by utilizing OFFER's "event" argument.
Disadvantage compared to B:
- it makes an additional synchronization in example 2, because
E does not distinguishes between START(WEAK) and START(NO_CHECK).
This should not be accepted, because example 2 should be
programmable with most efficiency,
Advantage compared to B:
- like D it has only a 2-routines interface instead of B's and C's
4-routines interface.
All proposals B-E have non blocking PUT and GET, there is no difference.
The big difference between B & E and C & D is the way they handle
the synchronization before the RMA, signaling that the window can
be used for RMA:
B & E: target process: POST to all and any
origin process: START when 'rank'ed target has posted
functional disadvantage: does not handle example III
C & D: target process: POST to 'rank'ed origin processes
origin process: START when got the post event from 'count'
targets
functional disadvantage: cannot handle applications in which
the target does know which origin
processes will PUT/GET to its window.
Implementation possibilities:
> (1) The operation on the target just sets or clears a local flag,
> and other processes performing PUT or GET check that flag.
This is efficient on shared memory systems.
(2) The target sends the information that it is OK to perform PUT
or GET to those processes that will issue a PUT or GET to its
window, and the origin processes check this flag locally
before performing PUT or GET.
This is efficient on virtual shared memory systems.
> (3) The target sends (or broadcasts) the information that it is
> (not) OK to perform PUT or GET to all of the processes that
> might try to PUT or GET, and they check the flag locally
> before performing PUT or GET.
This is _not_ efficient on any system, because "all of the
processes that _might_ try to PUT or GET" is a unknown
process' subset. It can be used only for small numbers of
processes in the communicator or in combination with
(1) or (4).
> (4) Each PUT or GET is actually processed by the target (as a
> message or page fault), which the target holds or acts upon
> based upon the value of the local flag.
This is efficient on distributed memory systems.
B&E can choose (1), (3) or (4), i.e. it can _not_ be implemented
efficiently on virtual shared memory systems.
C&D can choose (1), (2) or (4), i.e. it can be implemented
efficiently on all platforms.
Due to the functional lacks and the efficiency problems it might
be good to realize a combination of B&E and C&D - the following
new proposal F:
POST (comm, count, rank, store)
count integer number of origin processes to which a
post event should be sent.
count == 0 means that the synchronization
is done with every origin process that
issues a START operation.
If possible count > 0 should be used,
because on some platforms count == 0 may
be less efficient.
rank integer[count] ranks of the origin processes to which a
post event should be sent. A rank of
MPI_ANY_SOURCE defines that the post event
will be sent to each process in the
communicator except to the process issuing
this POST operation; it may be combined with
the own rank in an additional rank item.
In combination with count==1 the rank
MPI_PROC_NULL defines that no synchronization
is done.
store_flag 0 = no local stores are done into the window
since last POST.
1 = local stores are done into the window
since last POST.
START (comm, count, flag, rank)
count integer number of target processes from which a
post event should be received.
count == 0 has the same meaning as
count == 1, flag == 1, rank == MPI_PROC_NULL.
rank_flag integer 0 = the rank array is unused. START accepts
only post events with dedicated destinations
i.e. count was non-zero in the POST operation
on the target.
1 = the rank array is used and has count
items. The count argument in the POST
operations on the targets may or may not be
zero.
rank integer[*] ranks of the target processes from which
count post events are expected.
MPI_ANY_SOURCE defines that the post event
will be expected from each process in the
communicator except to the process issuing
this START operation; it may be combined with
the own rank in an additional rank item.
In combination with count==1 the rank
MPI_PROC_NULL defines that no synchronization
is done.
COMPLETE and WAIT as proposal B (Marc's).
I have canceled the START(MPI_STRONG) because it can be replaced
by a (buffered) message and a START(MPI_PROC_NULL or MPI_NOCHECK).
It makes no sense to have the same big functionality on the
synchronization after PUT/GET. Therefore it is not good to
make with this proposal the union of operations as in the
proposals D and E.
Therefore F has almost the same volume than Marc's proposal (B),
but better functionality and better efficiency.
And I do not see how to achieve F's functionality and efficiency
with unions of operations analog to D or E.
Marc, Raja and Dave can you agree? (Eric wrote to me, that he has no
time to look at this question in the moment).
Or is there still an important argument that is not
mentioned in my review of A-D and Dave's review of E.
Rolf
PS:
> I do not know if we will reach agreement before the next meeting, but
> maybe reviews like yours will help.
Thanks. I still hope that I must not put the review in LaTeX for the
next Chicago meeting.
> I'll fill in the rest of your table in a following message.
I did not found it in my folder. If you change your mind about
A-F it is possibly not needed.
Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de