> The only rationale I can find for restricting put/get to heap "shareable"
> memory is for clusters of SMPs, where one might want only one communication
> agent per SMP node, that services put/get requests for several local
> processes. Then, this agent must be a separate (user?) process. But, in suc
> situation, one will also want one communication agent that serves send/receiv
> requests on behalf of several local processes, and this agent has to have
> access to all the local processes' memory, in order to avoid superfluous
> memory-to-memory copy operations.
It is not obvious that the same agent has to service put/get requests as well as
send/receive. Implementation of send/receive does not require any hidden agent,
therefore the argument that because of send/receive communication the 1-sided
comms agent has to have access to all the local processes' memory is not
> Also, for clusters of SMPs, one will need to have a dual implementation where
> some (intranode) put/get calls are executed directly by the caller, and some
> (internode) calls result in a request been shipped to a remote agent. But,
> then, it is not more difficult to select the put/get method according to the
> location of the target process, as it is to select it according to the type o
> target window. Therefore, I don't see that my proposal makes life harder for
> implementers, with the exception of those implementers that are only
> interested in MPI inside an SMP, but not across SMP nodes.
This scheme is currently used in the Global Arrays (GA) library to implement
put/get on the networks of SMP/workstations. However, we use heavy-weight
processes and System V shared memory for safety and efficiency reasons. Processes
running on a SMP cluster require privacy of their own contexts (therefore they
cannot be implemented as lightweight threads sharing a single context to execute
put/gets directly). Also, a single agent (in GA called "data server") per SMP
machine is implemented as a process that attaches to shared memory region
when it is allocated. If you'd like for the hidden agent to access all of the
process memory, you end up with a separate agent thread per _every_ user process.
With the current thread packages (coarse-grain preemptive scheduling or no
preemptive-scheduling at all) we end up with a pretty high latency of put/get
requests or low machine utilization (agents spinning while waiting for requests).
These and other mechanisms for implementing put/get are discussed in a paper
presented at latest MPI Developers Conference "Beyond Message Passing:
A Case for One-Sided Communication in MPI" that can be accessed as: