A proposal to change the direction of 1-sided comm

David C. DiNucci (dinucci@nas.nasa.gov)
Tue, 25 Jun 1996 12:44:51 -0700

After looking at the semantics of PUT and GET this weekend, I came to the
a conclusion similar to one that Eric Salo expressed on Friday -- i.e. that
there were some problems with the portable use, implementation, and even
definition of these operations. However, there were some cases where they
did appear to be useful and well defined -- i.e. when they were used in
conjunction with MPI_BARRIERs. Specifically, in order for a programmer to
ensure that a PUT from process A to process B would actually be read in
process B, both processes would need to execute a barrier between the PUT and
the read (or, instead, perform some less savory actions like performing
some other collective communication or sending a message from A to B).
Similarly, in order for a programmer to ensure that a GET originating in
process A would read a value stored in B, both processes would need to
execute a barrier between the store and the GET. In fact, in the GET case,
the GET would not actually be satisfied until both processes participated in
*another* barrier. Eric Salo helped to review the work and made some useful
suggestions, especially w.r.t. GET.

When used in this way, many of the constructs in the 1-sided chapter appear
to be unnecessary. Specifically, WINDOW_IN, WINDOW_OUT, and FENCE no longer
seem to serve any useful function, since they can be integrated with the
BARRIER. In addition, window counters lose quite a lot of their utility,
since the user can only depend upon their update during the collective
operation.

Since a collective operation (i.e. BARRIER) seems to be required for every PUT
and GET anyway, it was natural to wonder whether GET and PUT could just be
made into collective operations. This would be a major change in direction.
The only routines from the current chapter to survive would be MPI_GET and
MPI_PUT (perhaps with new names), and possibly some form of MPI_IACCUMULATE and
MPI_RMW. I assert that this approach would be just as (if not more) portable
and efficient as PUT and GET with barriers, but it should be much easier to
understand and implement.

I am now proposing just that -- i.e. making PUT and GET collective.

The only syntactical change I am suggesting for MPI_PUT is the addition of a
single argument -- OUT recvbuf -- which would be significant only in the
root (i.e. target_rank) process. This would be necessary because the removal
of MPI_RMA_MALLOC would also remove the association of the communicator with
a buffer. Semantically, the execution of an MPI_PUT would remain almost
identical, except that it would not complete until all of the other processes
also executed MPI_PUTs. Another non-blocking version would also be required,
to allow greater efficiency on message-passing systems by allowing time for the
PUT messages to get to their destination before blocking. In this non-blocking
version, all processes could start the operation and continue on with other
work, but none could finish the operation until they had all started. On
shared-memory systems, it would still be possible to directly update the
buffer in the root (i.e. target_rank) process -- that is, as soon as the
root initiates the operation, so that a buffer (recvbuf) is identified.
This allows the root process to delay other processes from reading or writing
the buffer until it the root process is ready for them to do so.

A similar syntactical change is proposed for MPI_GET, adding one argument --
IN recvbuf -- which would be significant only in the root process. The
semantic changes would also be similar -- i.e. it would become a collective
call, and a non-blocking version would also be available.

These collective operations would resemble MPI_GATHER and MPI_SCATTER, except
that (1) the non-root end would supply information about where the data should
originate or end up in the root instead of using a rank ordering, and (2) it
would not be necessary to fill or distribute the entire data structure from
the root. In fact, I suggest that it be possible for processes to "opt out"
of the collective operation by supplying an origin_count of zero.

It is conceivable that collective operations like these could be included
in MPI without discarding the rest of the 1-sided chapter. However, I
believe that it will be difficult to identify functionality which is
portably available in the current 1-sided chapter that is not available
with these collective operations.

It could conceivably be more difficult to use these collective operations
than the existing non-collective operations with BARRIERs in some cases,
because the collective operations require that all GETs or PUTs be performed
in one operation, rather than issuing separate GETs and PUTs, followed by
a BARRIER (which effectively means "now go do them"). If this is a problem,
I think it could be remedied by offering other similar routines with different
argument structures -- i.e. lists of locations rather than complex derived
types, or even individual calls to collect arguments, with a final call that
uses the collection (like Xt_Setarg in X).

Feedback?

-Dave

PS. I can post the messages which lead to the above conclusions if it
appears that they would be constructive.
===============================================================================
David C. DiNucci | MRJ, Inc., Rsrch Scntst |USMail: NASA Ames Rsrch Ctr
dinucci@nas.nasa.gov| NAS (Num. Aerospace Sim.)| M/S T27A-2
(415)604-4430 | Parallel Tools Group | Moffett Field, CA 94035