Comments on current draft

Lewins, Lloyd J (llewins@msmail4.hac.com)
21 Nov 1995 15:44:30 -0800

The following are some comments on the latest draft of the 1-sided
chapter:

1) I would prefer to replace the name RMC with something like
"Shared". I find such a name more intuitively descriptive, and
easier to read.

2) I think that we should define a new object for shared objects, and
not overload the existing communicator construct. I see (almost)
no conceptually similarity between the comm and the shared objects.

Such a new
object will ease implementation because the existing Communicator
object will not have to be extended. I cannot see why a regular comm
should be burdened with extra baggage for the shared case. Also,
all functions operating on a shared comm would require extra
argument checking to verify that the comm was a "shared" comm.

By defining a new object, we side step the questions like: what
happens when you split, duplicate, etc. a shared comm. We also side
step questions like the ordering between shared accesses, and
point-to-point messages. This leads to the one disadvantage of my
proposal, which is that MPI_Barrier cannot be overloaded to assure
a global fence.

MPI_RMC_INIT would thus become MPI_SHARED_CREATE.

3) I fail to see the reason why arbitrary datatypes were dropped
from the creation of the shared/remote object. While it is true that
the displacements (because they are in bytes), are specific to the
target machine, I don't see the problem. The remote node must
specify a conformant type, with displacements which are appropriate
to the remote machine. Current proposals for sending/receiving types
as ascii will acheive exactly this. By conformant, I mean the
definition we already use for send/recv, i.e., equivelent type
signatures. The real problem is for the simple data types, for
which displacements are not available. In these cases, the
implementation will have to compute appropriate scaling factors.

4) With regards to the discussion on page 4, line 38 - yes it should
be acceptable that an RMC window maps a remote process into the
address space. I would note that currently MPI specifies nothing
about memory protection. In fact, one could implement MPI on a
pure physically addressed machine, in which all memory is
globally visible.

5) In our case, the Accumulate, and RMW operations will require that
the implementation of the shared object includes a hidden
binary semaphore or lock to guarantee mutual exclusion. Thus, these
operations need to be clearly defined as atomic with respect to
other calls to the same operation, but not neccessarily atomic with
respect to direct memory accesses by the target processor.

Corollary: the target processor must be allowed to call Accumulate
and RMW on its own shared object.

6) Page 17, line 3, I think you mean "Only the associative functions
can be used in RMW calls."

7) I find very little use for a non-blocking version of Accumulate and
RMW. In general, the operations within Accumulate and RMW need to
be kept as short as possible, because they tie up a single critical
resource. This seems to be contradictory to the need for non-blocking
operations.

8) Again, I prefer the creation of a seperate counter object rather
that overloading the (poor) request object. Once more this eliminates
problems like adding baggage to every request object, and handling
errors like passing a regular request to MPI_Set_counter_threshold.
I would define a blocking and non-blocking version of a function
called (say) MPI_Counter_threshold(). This function would take a
new threshold value as a parameter, and the non-blocking version
could return a request on which to wait.

9) One might add an implicit fence to the RMW operations. Rationale:
this handles the common case in which an RMW operation is used
to lock the data structure during a sequence of remote get/puts.

10) We need to do a very clear job on defining the ordering semantics.
We also need to define what to do about "false" sharing. For
example, what happens on a word oriented machine, when two
processes simultaneously write to bytes which are contained
in one word. If we don't define this, some implementation will
take an optimized choice which breaks code.

11) Is there a general need for the comm argument to
MPI_Request_handler.? If the handler doesn't know the comm by
context, then why not use the extra state?

12) There appears to always be a "race" condition when cancelling
requests for which a handler is posted. Before calling cancel,
how do I know that the request is still valid, and not a dangling
reference? I suppose one could use an auxillary flag, set by the
handler when it fires. Then one could lock out the handler before
testing the flag. One still needs a guarantee that the locking of a
handler prevents the implementation from deleting the request.

Lloyd Lewins
Hughes Aircraft Co.
llewins@msmail4.hac.com