PUT on an unreliable interconnect

Rolf Rabenseifner (Rabenseifner@RUS.Uni-Stuttgart.DE)
Tue, 23 Apr 1996 17:39:38 +0100 (DST)

Besides the discussion of cache incoherency I would like to
discuss in Chicago the issue of PUT on ATM based clusters.

-- RELIABILITY PROBLEM --

In the paper
Todd Mummert, Corey Kosak, Peter Steenkiste and Allan Fisher.
Fine Grain Parallel Communication on General Purpose LANs.
School of Computer Science, Carnegie Mellon University,
Pittsburgh, PA 15213.
http://www.cs.cmu.edu/afs/cs/project/iwarp/archive/nectar-papers/96ics.p
"a simple adapter for ATM networks that supports efficient
remote memory writes, sometimes referred to as PUT operations"
is described.

The main difference to distributed and shared memory multiprocessors
is the lack of a reliable interconnect.
But it has the chance to give us in the future a better
price/performance.

Now the problem:
At a synchronization point after some PUTs are done
the application must give collectively the control to
the message passing system because it has to
retransmit lost or currupted data.
To get the most performance it is necessary that the
user data in the execution node is not overwritten
until that moment.

A possible solution consist of several changes:

A) At MPI_RMA_INIT the application must decide which
synchronization model it wants to use;

B) we additional allow the following synchronization models:
B1) The application synchronizes all processes with one
of the already defined methodes and additionally
it calls MPI_DELIVER(newcomm) in all processes
in newcomm, and MPI is allowed to implement
MPI_DELIVER as a local routine (or an empty macro on
cache coherent systems with reliable interconnect)
or collective routine (on systems without reliable
interconnect).
B2) Same as B1) but sending processes do not modify
the data used in MPI_PUT until it has called
MPI_DELIVER.
(I.e. the runtime must not save the data for a
potential retransmit on unreliable interconnects)

C) We recommend to use the counters for synchronizing
in conjunction with B2) (i.e. it can be implemented
efficiently on all possible systems!!!)

I had written that problem at the end of a mail to the caching topic,
but it is an own topic.

I think the MPI forum should address this type of hardware.
I think this proposal is a chance to define an optimal
interface on all hardware plattforms for many applications
because many applications can be programmed according to C).

Comments?

Rolf

PS: Please give me a paper copy of your comments at the meeting
because I cannot read mail any more til the meeting.


Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de