Re: Semantics of MPI Cancel?

Marc Snir (snir@watson.ibm.com)
Sun, 9 Mar 1997 22:58:24 -0400

Tony complains he got a blank message -- so here I try again

What is the correct behaviour of the following program?

Process 0 Process 1
------------- --------------
MPI_Init () MPI_Init ()
MPI_Isend (...,1,...)
MPI_Cancel ()
MPI_Wait ()
MPI_Test_cancelled ()
MPI_Barrier () MPI_Barrier ()
MPI_Probe ()

Must test cancelled return true? Must probe return false?

I believe test cancelled may return either true of false. If it returns
true, probe returns false. If it returns false, probe returns true.

****
Since there is no receive to match the send, the cancel must succeed, and
test_cancelled must return true. It clearly will not do to tell the sender
that the send was cancelled, if it might be received at the other end. It
will not do either to refuse to cancel, if there might be no matching
receive, as in this example. This would negate the usefulness of cancel
if, at some arbitrary time after the send was posted, the send becomes
uncancellable, even if no matching receive will ever be posted.

I would prefer to say that probe must return false, in this case. But this
probably points out to something we miss in the spec. Namely, that if a
user probe for an incoming messagfe, then it is erroneous not to receive it
-- we shaould add this as a clarification to MPI-1. The reason is that we
do not want to cancel a message after it has been probed, so it must be
received.

So, I propose the following clarification: "A message that has been
matched by a call to MPI-PROBE must be received by a subsequent call to
MPI_RECV or MPI_IRECV."
****

Consider an implementation which uses an eager send protocol for short
messages. The Isend may be complete by the time the cancel call occurs. I
don't think the intention is that cancel is actually "unsend" in this case.

***
The Isend is complete at the sender side and, yes, the cancel may require a
communication with the receiving side to "undo" the send. This much has
been clear since cancel was defined by the MPI forum. Indeed, this is the
precise reason why many implementors were against the cancel function in
MPI1.
****

Rationale:

The standard says that cancel "marks for cancellation a PENDING,
nonblocking communication operation." (my emphasis). In the above (eager)
case, the send is no longer pending, it is complete (but not yet
completed). The meat of the definition of cancel is then to define that a
subsequent wait is local, not that a probe after the cancel is guaranteed
to return false.

****
The send is pending until it has completed at both ends.
***

I would argue that if we wanted a way to "unsend", we would have this
functionality for blocking sends as well.

***
You have no "request" to represent the operation, in the blocking case, so
that you cannot have a cancel. Otherwise, yes, it makes sense for blocking
sends, too.
******

A related issue, consider:

Process 0 Process 1
------------- --------------
MPI_Init () MPI_Init ()
MPI_Isend (...,1,...)
MPI_Probe ()
MPI_Cancel ()
MPI_Wait ()
MPI_Test_cancelled ()
MPI_Barrier () MPI_Barrier ()
MPI_Recv ()

Assuming the probe returns true, should the cancel go ahead or not? If it
does, then the standard is violated with regards to the recv (a probe
showed a
message exists, but the receive doesn't get it). If it doesn't,
the standard is violated with regards to the cancel (the wait is not
local). Catch 22.

***
See clarification above.
*******

I believe that the send should be canceled, and that we need to add
clarifying text to probe (the recv gets the message, assuming no other recv
has grabbed it AND the send was not successfully canceled).

Comments?

Lloyd Lewins
Hughes Aircraft Co.,
llewins@msmail4.hac.com

****
Marc
*******