> The simplified model is not just for shared memory machines (where
> double buffering offers a solution as you point out), it also helps
> implementations that target generic clusters of workstations (e.g.
> MPICH & LAM). Maybe we should also hear from the PD implementors on
> this issue.
Perhaps the easiest solution for the clusters of workstations would be to spawn
(MPI-2) in the first MPI_RMA_INIT call an additional data server (hidden agent)
process on each machine. The shared memory is then used for communication
between application and server processes if they reside within the same machine,
and MPI message passing for the other cases. MPI-1&2 are powerful enough to
provide a nice and clean infrastructure for this implementation.
The following reflects the client side:
if(process "rank" is on the same machine)
use shmem access
MPI_send(..., data, data_server(rank), MPI_COM_HIDDEN)
The server process seats in an infinite loop receiving messages
from application processes (comming from the network) and
executes put/get code as needed.
In this case, the hidden agent might end up wasting CPU cycles
waiting for request messages, but there could be some optimizations possible.