[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Generalized Request Progress




On Sep 4, 2007, at 5:06 AM, Hubert Ritzdorf wrote:

Hi,

Marc Snir wrote:
The problem occurs with nonblocking send-receives as well. IMHO, one cannot implement MPI without a progress engine, either threads or interrupts or offload engine. Of course, most implementations are noncompliant on that account.

To be more specific, my understanding of the MPI standard is that in the following example:

MPI_Init(...);
...
MPI_Comm_dup(MPI_COMM_WORLD, &comm);
MPI_Comm_rank(MPI_Comm_World, &myrank);
switch(myrank) {
case 0: {
foo(); /* do some long computation
MPI_Send(a, large_number, MPI_REAL, 1, tag, MPI_COMM_WORLD);
MPI_Send(b, 1, MPI_REAL, 2, tag, comm);
}
case 1: {
MPI_Irecv(a, large_number, MPI_REAL, MPI_ANY_SOURCE, tag, MPI_COMM_WORLD);
while(1) ;
}
case 2 : {
MPI_Recv(b, 1, MPI_REAL, 0, tag, comm);
MPI_Abort(MPI_COMM_WORLD, 666);
}
MPI_Finalize();


The program should abort, and should not be stuck in an infinite loop: the receive on process 1 should make progress even though no MPI call occurs on that process after the receive. Therefore, the first send on process 0 should complete, the second send should start, and the receive on process 2 should complete.
I didn't attend the MPI standardization committee but I don't agree to this view.
The MPI 1.1 standard says in Chapter 3.7 "Nonblocking communication".


Similarly, a nonblocking receive start call initiates the receive operation, but doesn't complete it.
The call will return before a message is stored into the receive buffer.
A separate receive complete (!!) call is needed to complete the receive operation and verify that the data
has been received into the receive buffer.


Thus, the 1st blocking send may wait until the "separate receive complete call" is issued which is never executed since
MPI process 1 hangs for ever in the while operation and message number 2 is never sent.



But the same Standard also says (in Section 3.5, p.31, paragraph labeled "Progress"):

"If a pair of matching send and receives have been initiated
on two processes, then at least one of those two operations will complete,
independently of other actions in the system: the send operation will
complete, unless the receive is satisfied by another message, and completes;
the receive operation will complete unless the message sent is consumed by
another matching receive that was posted at the same destination process."


That is pretty unambiguous and supports Marc Snir's interpretation.
However, it does seem to contradict the statement "...a separate receive
complete call is needed to complete the receive operation..." I have
always felt this language was ambiguous and maybe self-contradictory.
The essence of the ambiguity, to my mind, is that the word "complete"
is overloaded in the MPI standard. It is never really defined but seems
to be used in two different ways. As an adjective, it is sometimes used
to mean that the data has been completely copied out of send buffer (for
a send request) or into receive buffer (for recv request), and sometimes
to mean that AND that the actions guaranteed to have taken
place when MPI_Wait returns (such as deallocation of the request
object) have taken place. I would use "complete" for the first notion
and maybe "finalized" for the second. So a request can be complete,
but not finalized. MPI_Wait will block until the request has completed,
and then finalize the request. A separate receive finalization call
is needed to finalize the receive request operation. Etc. This would
resolve the ambiguity (I think).


-Steve




In this view, there is no difference between generalized requests and send or receive requests; it is just that it is much easier to run into problems with progress of generalized requests. The standard does need any fixing, but many implementations do.
Thus, the behaviour of the MPI implementations which wait for the "separate receive complete call"
(or other MPI communication calls such as MPI_Iprobe or MPI_Test)
in order to complete the 1st send operation correspond to the MPI standard and don't need a fix.


Hubert Ritzdorf
IT Research Labs
NEC Europe


-------------------------------------------------------------------
Stephen F. Siegel, Assistant Professor

  address: Department of Computer & Information Sciences
           103 Smith Hall
           University of Delaware
           Newark DE 19716
   office: 432 Smith Hall
      web: http://www.cis.udel.edu/~siegel
    email: siegel@xxxxxxxxxxxx
      tel: +1 302 831 0083