>
>
> Here is one possible implementation for MPI_PUT() on a NOW (SMPs and/or
> uniprocessors) that works quite nicely if we limit ourselves to the heap: At
> initialization time, one agent is forked off for the application. When
> MPI_SHMALLOC() is called, all process in the communicator allocate some shared
> buffers that are also mapped in by the agent. Whenever a MPI process within the
> host wants to perform a put within that same host, it simply copies the data
> into the appropriate shared addresses. When it wants to perform a put to a
> process on a different host, it sends the data off to the remote agent on that
> host. And when a put request arrives at the local agent from a remote MPI
> process, the data is simply copied into the shared buffer by the agent exactly
> as if it had come from a local MPI process. MPI_GET() would work similarly.
> This is a very clean, simple, and efficient implementation that requires
> nothing special from the kernel and is extremely portable.
>
> I challenge anyone to come up with a similarly strong implementation that can
> handle arbitrary addresses.
>
>
This implementation is not portable to an SP2 (or CM5, or,...) without
significant loss of performance, since, currently, only one process per node
can use fast communication. Besides, such implementation is likely to be even
slower than an implementation that sits atop on an hrev callback interface,
since it forces additional context switching. It also uses an expensive
resource -- a port. I don't see that it is any simpler or faster than an
hrecv put/get server, which can be implemented on any MPI system that supports
a receive and call or hrecv interface:
The remote agent is a posted hrecv that waits for messages on a special
communicator. a Get is implemented as "send request to RMC agent", followed by
a receive; a put is implemented as "send request to RMC agend", followed by a
send. The RMC agent posts the matching send or receive, and posts itself
again.
I would claim this is "a similarly strong implementation".
The only rationale I can find for restricting put/get to heap "shareable"
memory is for clusters of SMPs, where one might want only one communication
agent per SMP node, that services put/get requests for several local
processes. Then, this agent must be a separate (user?) process. But, in such
situation, one will also want one communication agent that serves send/receive
requests on behalf of several local processes, and this agent has to have
access to all the local processes' memory, in order to avoid superfluous
memory-to-memory copy operations.
Also, for clusters of SMPs, one will need to have a dual implementation where
some (intranode) put/get calls are executed directly by the caller, and some
(internode) calls result in a request been shipped to a remote agent. But,
then, it is not more difficult to select the put/get method according to the
location of the target process, as it is to select it according to the type of
target window. Therefore, I don't see that my proposal makes life harder for
implementers, with the exception of those implementers that are only
interested in MPI inside an SMP, but not across SMP nodes.
I would like to hear the oppinions of the various MPI implementers (MPICH,
LAM, etc.) on this.
One more remark:
What we are debating here, is whether to restrict a put/get window to be in
memory that is dynamically allocated by a MPI_RMC_MALLOC type call. A
separate debate, which is, I think, orthogonal, is how many bells and whistles
we attach to put/get: remote requests, remote datatypes, different addressing
modes, etc. As far as I can see, these additional functions are as easy or
hard to support in both types of implementation.
-------------------
Marc Snir
IBM T.J. Watson Research Center
P.O. Box 218, Yorktown Heights, NY 10598
email: snir@watson.ibm.com
phone: 914-945-3204
fax: 914-945-4425