RE: feedback on a new MPI idea

Arkady Kanevsky (arkady@mail11.mitre.org)
Wed, 28 Feb 96 10:34:32 -0500

Ron,

Under your proposal the sender is still need to receive
the message that the receiver is ready and posted the appropriate
receive to reduce memory copy.

I can not speak for the general MPI users ...
but I will speak (with Tony's permission) about one of the real-time
extensions. It is based upon the time-driven approach.
Both the sender and receiver apriori agree (written in the code)
when they will send/receive messages (the same is true for collective
operations). The format of the send is slightly different then now.
We just add the time when the sent starts. The receiver guarantee that
by that time the receive is posted (actually the receive is not really needed,
the MPI server provider just need to know the address where to put an
incomming message). The receiver application nows at what time it is
free to access the received message memory and when this memory is under
MPI server control (the same is true for the sender).
This removes the need of any handshaking between
sender and receiver and between application and MPI server, thus allowing
higher application-to-application throughput, and lower latencies.
The main advantage for real-time applications is that this approach allows
application to "schedule" the communication media to ensure timely
message deliveries and reduce message collision (delays) on a backplane.

This is just a sketch...
The approach may not be for general use but for a specific niche...

Arkady

>
>Greetings:
>
>I wanted some feedback on an idea that I might
>propose to MPI-2.
>
>I was telling an application developer at SNL
>that the easiest way to avoid memcpy's and to
>get lowest latencies and highest bw (for large
>msgs under Puma on the Paragon), is to set up a
>protocol where he pre-posts all the recv's and
>then sends an "okay to send" msg to the sender.
>The sender waits for the "okay to send" msg, and
>then blasts away with ready sends.
>
>It occurred to me that if MPI_Rsend() is used with
>the idea that there are posted recv's on the other
>side, then the recv's that get posted on the other
>side are done so with the idea that the send's
>haven't been done yet. Therefore, there would be
>no need to search an unexpected msgs list looking
>for something that hasn't been sent. In both of
>the MPICH devices that I've done, a good portion
>of the time spent posting a recv is in looking
>through the unexpected queue. It seems like it
>would be good to have a recv function that has
>the semantics of: post this recv and don't
>bother looking for matches with unexpected msgs,
>because I know there aren't any here. This
>type of recv could still be matched with any
>type of send (sync,ready, etc.), and if there
>are matching unexpected msgs, blame it on
>programmer error (like ready sends w/o posted
>recvs).
>
>I can think of several reasons why the time to
>search an unexpected queue would be nominal, but
>I can also think of several reasons why it might
>not be.
>
>The protocol described above may also be used
>when the time comes to implement MPI-2's get/iget
>on top of point-to-point operations (post a recv
>and then send a request message).
>
>I don't know if this was ever discussed for MPI-1,
>but I can't think of a good reason not to have such
>a function.
>
>Any thoughts?
>
>-Ron
>
>
>