Request Handler semantics

Carter Edwards : carter@ticam.utexas.edu (carter@ticam.utexas.edu)
Tue, 30 Apr 1996 16:24:08 -0500 (CDT)

Some concerns regarding the semantics of Communication Handlers
as described in the DRAFT for MPI-2 One Sided Communications,
dated April 8, 1996.

------------------------------------------------------------------------
*** What are the Semantics of Communication Handlers? ***

In the current draft it is stated several places that
handlers are invoked upon completion of a communication.
However, the existance of the MPI_REQUEST_COMPLETE
procedure, to be called from a handler, implies that
handlers are invoked before a communication request
completes and the handler is responsible (has the option?)
of completing the request.
This is at the least some inconsistent terminology.
However, I am more concerned that the semantics may be inconsistent.

I propose that before an interface for handlers can be adequately
specified a consistent set of handler semantics must be defined.

A syntax-independent discussion of Request Handler semantics follows.

------------------------------------------------------------------------

*** What is the relationship between Requests and Request Handlers? ***

We agree that Request Handlers (extra_state,routine)
are to be somehow associated with Requests.

What is this relationship? The DRAFT implies via the MPI_POST_HANDLER
procedure that a Request either has or does not have a Handler
associated with it, as the MPI user chooses.
However, the DRAFT also implies via the MPI_POST_ANY_HANDLER and
MPI_POST_SOME_HANDLER prodedures that a given Request MAY have a
handler associated with it, depending upon which order a given
list of Requests complete.

I propose that a given Request either has or does not have
exactly one Handler associated with it (no "might have"),
and that the existance of this association deterministic.
Thus an MPI user could at any time query a Request for the
existance of an associated Handler.

I propose that a given Handler (extra_state,routine) may
be associated with any number of Requests. However,
the MPI run-time environment need only maintain the
Request-to-Handler relationships and not the reverse.

*** Recall the Communication Request Semantics ***

In MPI-1 a Request has three states which are exposed
to the MPI user:
1) NULL (or undefined)
2) Inactive
3) Active

A Request is "created" in either the Inactive state
or the Active state. A request which is created in
the Inactive state is known a "Persistent Communication Request".
A request which is created in the Active state is
known as an "[Ordinary] Communication Request".

Transition from the Inactive to Active state is accomplished
via a call to MPI_START or MPI_START_ALL for the Persistent Request.

Successfull transition from the Active state to the Inactive state
is occurs when the following TWO conditions are met:
1) the data transfer associated with the particular Request is
finished, what is meant by "finished" varies according to the
type of send involved, AND
2) a call is made one of the MPI_WAIT or MPI_TEST routines
for the Request.
This is known as "Request Completion".

Request Cancelation, where condition #1 above is replaced by
a call to MPI_CANCEL, results in a cancelled transition from
the Active to Inactive state.

An [Ordinary] Request automatically transitions from the
Inactive State to the NULL state during Request Completion
or Request Cancelation.

These simple state-machine semantics provide both consistency
for the MPI user and flexibility for the MPI run-time environment.

*** Request Semantics MUST BE Extended for Handlers ***

I assert that the clean, consistent, and implementation-flexible
semantics of MPI-1 point-to-point Communication Requests are
inadequate for MPI-2 Handlers. The semantics must be extended.

Reason #1:
Recall the MPI-1 semantics for point-to-point communication
(Section 3.5), "Messages are non-overtaking." In the specification
this applies to messages which have same source and destination,
and are matched by the same receive. Unfortunately, at least one
MPI implementation (mpich) orders messages which are not matched
by the same receive. This is understandable as it is not reasonable
to expect an MPI run-time environment to buffer an unlimited number
of received messages while waiting for a particular receive
to be satisfied. However, such reasonable implementation can
(and has) lead to deadlocks when an application uses the
one-sided communication Handler paradigm.
(The example is non-trivial).

Reason #2:
The MPI-1 Request semantics make no allowance for a Request
in the Active state to Complete unless an explicit call to
an MPI_WAIT or MPI_TEST procedure is made.
If the Handler invocation can only be *guaranteed* to occur
when a call to MPI_WAIT or MPI_TEST is made then, concept of
Communcation Request Handler degenerates into some nice
"syntactic sugar" and do not provide an MPI user with any
additional functionality. Thus unless the Request semantics
are extended, the MPI-2 extension for Communication Request
Handlers is not worth the effort.

*** What Should be the Extension to Request Semantics? ***

An "extension" of the Request semantics implies that any
new semantics be a superset of the existing MPI-1 semantics.

I proposed in "A Consistent Extension of the Message Passing
Interface (MPI) for Nonblocking Communcation Handlers",
TICAM Report 96-11, February 1996, that the
Communication Request semantics be extended to support
"self-completing" requests via Request Handlers.

"Self-completion" means that upon satisfaction of
Request Completion condition #1:
"the data transfer associated with the particular Request is
finished, what is meant by "finished" varies according to the
type of send involved,"
that Request Completion condition #2 is satisfied through
the invocation of a Request Handler.

I propose that the Request semantics be extended to include
a fourth state:
1) NULL (or undefined)
2) Inactive
3) Active
4) Finished or Ready-for-Completion

An MPI-1 Request (without a Handler) is said to be
"Finished" or "Ready-for-Completion" if a call to
one of the MPI_TEST procedures on the Request would
Complete the request.

Under this semantic extension:
1) MPI_TEST routines transition a Request from the
"Finished" state to the Inactive state
(and then NULL state for ordinary Requests).

2) MPI_WAIT routines wait for the Request to enter
the "Finished" state and then perform the
same state transition as MPI_TEST.

3) MPI_CANCEL forces the Request to transition
from the Active state to the "Finished" state
and annotates the Request accordingly for
a subsequent call to MPI_TEST_CANCELLED.

4) MPI_START or MPI_STARTALL may *not* be called
for a Persistent Request which is in the
"Finished" state.
Such a call generates an MPI error.

*** What are the Request Handler Semantics? ***

Handler Invocation:

When a Request Handler is associated with a Request
the Request Handler will be invoked by an MPI implementation
for the Request
1) when the Request is in the "Finished" state and
2) at the "earliest opportunity".
The "earliest opportunity" caveat provides an MPI-2 implementation
with flexibility (e.g. thread, interrupt, polling) in invoking
Request Handlers.
What an "earliest opportunity" is should be clearly stated by
an MPI-2 implementation. However, the MPI-2 specification should
state a minimum set of requirements - for example:
"Request Handlers shall be invoked for all Requests which
have associated Handlers and are in the Finished state
during any blocking send/recv or update to a Request.

Request Completion and Request Handlers:

If a Request is left in the "Finished" state following the
invocation of an associated Handler then that Handler
could be repeatedly invoked for the same message.
To prevent this "infinite Handler invocation loop" one
of the following must occur:
1) A fifth state of "Handler Invoked" must be added
to the Communication Request.
2) A Request is Completed, or taken out of the
"Finished" state, when the handler is invoked.
3) The Handler is "detached" from the Request and
a subsequent call to an MPI_TEST or MPI_WAIT procedure
completes the Request.
I propose a combination of case #2 and case #3,
thus avoiding the complexity of yet another Request state.
A Request Handler may, during its invocation,
complete the Request with a call to MPI_TEST or MPI_WAIT.
If the Request has not been completed when the
Handler returns then the Handler is automatically
detached from the Request, thus enforcing an
"execute once per message" policy.

Priority:

*If* a Request is in the "Finished" state, the Request has an
associated Handler, the Handler has not been invoked, and one
of the MPI_TEST or MPI_WAIT procedures is called for the Request
*then* that call to an MPI_TEST or MPI_WAIT procedure
will block until either
1) the Request is completed by the Handler (via MPI_TEST or MPI_WAIT) or
2) the Handler returns and is automatically detached from the Request.
Under case #1 the MPI_STATUS *is not* output to the called
MPI_TEST or MPI_WAIT procedure. Under case #2 the MPI_STATUS
*is* output to the called MPI_TEST or MPI_WAIT procedure.

Note that if the Handler performs a blocking communication
before Completing its own Request, then an external call to
an MPI_TEST procedure on that Request would block.
This does not violate MPI-1 semantics as the MPI user has
blocked the "thread-of-control", not the MPI implementation.

Attaching / Detaching Request Handlers:

A Request has either no Handler (null Handler) or one Handler
attached to it. Attaching a Handler to a Request which already
has an attached Handler replaces the existing Handler with the
new Handler. Detaching a Handler is accomplished by "attaching"
the MPI_NULL_HANDLER routine to the Request.

The action of attaching a Handler to a Request (via some procedure)
will depend upon the state of the Request.
1) NULL Request - an error is generated
2) Inactive Request - the Request is updated to reference the Handler
3) Active Request - the Request is updated to reference the Handler
4) "Finished" Request
- the current (non null) Handler is invoked and then
- the Request is updated to reference the new Handler.
If the Request is still in the "Finished" State
-- the new (non null) Handler is immediately invoked, and
-- in a threaded environment the procedure which
attached the Handler to the Request may return.

Disabling Request Handlers ( ~ Locking):

An MPI user may wish to temporarily disable the Request Handler
functionality within an MPI implementation.
Disabling Request Handlers prevents Request Handler invocation,
it does not block the execution of a Handler which has already
been invoked but has not yet returned.
A Request which has an attached disabled Handler *cannot* complete
until either the Handler is detached or the Handler functionality
is enabled.
A call to an MPI_TEST function for such a Request will return
without completing the Request.
A call to an MPI_WAIT function for such a Request should return
a "guaranteed deadlock" error.

* This is not a threads "lock". *

If an application wishes to block any possible concurrent
access (via thread implemented Handlers) to some set of
data the application would first disable the Handlers
and then call an MPI_TEST routine for the Requests
in question. If the Request Handler is in execution
(via threaded environment), the MPI_TEST routine will
not return until either the Request is completed by the
Handler or the Handler returns.

------------------------------------------------------------------------