The receive of process 2 must complete.
I think that the standard is pretty unambiguous and not
contradictory: "complete" is applied separately to the send
operation and to the receive operation. In some cases, the
standard requires that a send completes, even though the matching
receive may never complete; and vice-versa, with ISend. (I also
happen to remember the "original intent" of the guy that wrote the
paragraph on progress.)
I agree that (a) this definition may slow down MPI on some systems
(although the damage can be minimized: the odd cases where it is
not sufficient to quick the progress engine only when MPI calls
occur are very rare, so that the progress engine can be throttled
down). I am also aware that it takes some work to implement an
efficient progress engine. These are the two reasons we did not
push the case originally, and allowed implementors to ignore the
issue -- we basically agreed not to clarify the implications of
the standard; but the standard is pretty unambiguous. By now, any
smart NIC should provide enough hooks to be able to have a correct
progress engine.
As for the comment of Rolf, in my example, each process calls
MPI_Finalize before it exits.
Marc Snir
Interim Director, Illinois Informatics Initiative
University of Illinois at Urbana Champaign
Library and Information Systems Building, MC-493
501 E. Daniel St, Room 123
Champaign, IL 61820-6211
Tel: (217) 244 6568 Fax: (217) 244 3302
www.cs.uiuc.edu/homes/snir
On Sep 4, 2007, at 10:56 AM, Stephen Siegel wrote:
On Sep 4, 2007, at 5:06 AM, Hubert Ritzdorf wrote:
Hi,
Marc Snir wrote:
The problem occurs with nonblocking send-receives as well.
IMHO, one cannot implement MPI without a progress engine,
either threads or interrupts or offload engine. Of course, most
implementations are noncompliant on that account.
To be more specific, my understanding of the MPI standard is
that in the following example:
MPI_Init(...);
...
MPI_Comm_dup(MPI_COMM_WORLD, &comm);
MPI_Comm_rank(MPI_Comm_World, &myrank);
switch(myrank) {
case 0: {
foo(); /* do some long computation
MPI_Send(a, large_number, MPI_REAL, 1, tag, MPI_COMM_WORLD);
MPI_Send(b, 1, MPI_REAL, 2, tag, comm);
}
case 1: {
MPI_Irecv(a, large_number, MPI_REAL, MPI_ANY_SOURCE, tag,
MPI_COMM_WORLD);
while(1) ;
}
case 2 : {
MPI_Recv(b, 1, MPI_REAL, 0, tag, comm);
MPI_Abort(MPI_COMM_WORLD, 666);
}
MPI_Finalize();
The program should abort, and should not be stuck in an
infinite loop: the receive on process 1 should make progress
even though no MPI call occurs on that process after the
receive. Therefore, the first send on process 0 should
complete, the second send should start, and the receive on
process 2 should complete.
I didn't attend the MPI standardization committee but I don't
agree to this view.
The MPI 1.1 standard says in Chapter 3.7 "Nonblocking
communication".
Similarly, a nonblocking receive start call initiates the
receive operation, but doesn't complete it.
The call will return before a message is stored into the receive
buffer.
A separate receive complete (!!) call is needed to complete the
receive operation and verify that the data
has been received into the receive buffer.
Thus, the 1st blocking send may wait until the "separate receive
complete call" is issued which is never executed since
MPI process 1 hangs for ever in the while operation and
message number 2 is never sent.
But the same Standard also says (in Section 3.5, p.31, paragraph
labeled
"Progress"):
"If a pair of matching send and receives have been initiated
on two processes, then at least one of those two operations will
complete,
independently of other actions in the system: the send operation
will
complete, unless the receive is satisfied by another message, and
completes;
the receive operation will complete unless the message sent is
consumed by
another matching receive that was posted at the same destination
process."
That is pretty unambiguous and supports Marc Snir's interpretation.
However, it does seem to contradict the statement "...a separate
receive
complete call is needed to complete the receive operation..." I
have
always felt this language was ambiguous and maybe self-
contradictory.
The essence of the ambiguity, to my mind, is that the word
"complete"
is overloaded in the MPI standard. It is never really defined
but seems
to be used in two different ways. As an adjective, it is
sometimes used
to mean that the data has been completely copied out of send
buffer (for
a send request) or into receive buffer (for recv request), and
sometimes
to mean that AND that the actions guaranteed to have taken
place when MPI_Wait returns (such as deallocation of the request
object) have taken place. I would use "complete" for the first
notion
and maybe "finalized" for the second. So a request can be
complete,
but not finalized. MPI_Wait will block until the request has
completed,
and then finalize the request. A separate receive finalization call
is needed to finalize the receive request operation. Etc. This
would
resolve the ambiguity (I think).
-Steve
In this view, there is no difference between generalized
requests and send or receive requests; it is just that it is
much easier to run into problems with progress of generalized
requests. The standard does need any fixing, but many
implementations do.
Thus, the behaviour of the MPI implementations which wait for
the "separate receive complete call"
(or other MPI communication calls such as MPI_Iprobe or MPI_Test)
in order to complete the 1st send operation correspond to the
MPI standard and don't need a fix.
Hubert Ritzdorf
IT Research Labs
NEC Europe
-------------------------------------------------------------------
Stephen F. Siegel, Assistant Professor
address: Department of Computer & Information Sciences
103 Smith Hall
University of Delaware
Newark DE 19716
office: 432 Smith Hall
web: http://www.cis.udel.edu/~siegel
email: siegel@xxxxxxxxxxxx
tel: +1 302 831 0083