[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Generalized Request Progress
Hi,
Marc Snir wrote:
The problem occurs with nonblocking send-receives as well. IMHO, one
cannot implement MPI without a progress engine, either threads or
interrupts or offload engine. Of course, most implementations are
noncompliant on that account.
To be more specific, my understanding of the MPI standard is that in
the following example:
MPI_Init(...);
...
MPI_Comm_dup(MPI_COMM_WORLD, &comm);
MPI_Comm_rank(MPI_Comm_World, &myrank);
switch(myrank) {
case 0: {
foo(); /* do some long computation
MPI_Send(a, large_number, MPI_REAL, 1, tag, MPI_COMM_WORLD);
MPI_Send(b, 1, MPI_REAL, 2, tag, comm);
}
case 1: {
MPI_Irecv(a, large_number, MPI_REAL, MPI_ANY_SOURCE, tag,
MPI_COMM_WORLD);
while(1) ;
}
case 2 : {
MPI_Recv(b, 1, MPI_REAL, 0, tag, comm);
MPI_Abort(MPI_COMM_WORLD, 666);
}
MPI_Finalize();
The program should abort, and should not be stuck in an infinite loop:
the receive on process 1 should make progress even though no MPI call
occurs on that process after the receive. Therefore, the first send on
process 0 should complete, the second send should start, and the
receive on process 2 should complete.
I didn't attend the MPI standardization committee but I don't agree to
this view.
The MPI 1.1 standard says in Chapter 3.7 "Nonblocking communication".
Similarly, a nonblocking receive start call initiates the receive
operation, but doesn't complete it.
The call will return before a message is stored into the receive buffer.
A separate receive complete (!!) call is needed to complete the receive
operation and verify that the data
has been received into the receive buffer.
Thus, the 1st blocking send may wait until the "separate receive
complete call" is issued which is never executed since
MPI process 1 hangs for ever in the while operation and message number
2 is never sent.
In this view, there is no difference between generalized requests and
send or receive requests; it is just that it is much easier to run
into problems with progress of generalized requests. The standard does
need any fixing, but many implementations do.
Thus, the behaviour of the MPI implementations which wait for the
"separate receive complete call"
(or other MPI communication calls such as MPI_Iprobe or MPI_Test)
in order to complete the 1st send operation correspond to the MPI
standard and don't need a fix.
Hubert Ritzdorf
IT Research Labs
NEC Europe