During the last meeting I (foolishly?) asked if there should be a callback
function for cancellation of a generalized request. The response, of course,
was "write a proposal". So, here's a first shot at it... Comments are
appreciated-- I'm not convinced all the details are worked out.
Tom
============================================================================
PROPOSAL TO ADD A NEW CALLBACK FUNCTION FOR CANCELLATION
OF A GENERALIZED REQUEST
This proposal is an attempt allow a user to better define what happens when a
generalized request is cancelled. This may be useful in, for example, a
library written on top of MPI, where graceful handling of strange situations
is desired. Cancellation may happen as a result of a call to MPI_CANCEL or
due to failure of another callback function (init_fn, start_fn, or
continue_fn). The proposed solution is to create yet another callback
function, cancel_fn. In cancel_fn, the user can "clean up" after the request
(free any dynamically allocated memory, reset any internal state, cancel
internal messages, etc.).
<Additions to section 3.2.2>
cancel_fn This callback function is invoked whenever there is a failure in
callback functions init_fn, start_fn, or continue_fn (see section
3.2.3 for a discussion of failure) or whenever MPI_CANCEL is
called to cancel the request. This function can be used, for
example, to deallocate memory, cancel internal messages, reset
state, etc.
<Change to interface for MPI_REQUEST_TYPE_CREATE>
MPI_REQUEST_TYPE_CREATE(create_fn, start_fn, continue_fn, free_fn,
cancel_fn, type_req)
IN create_fn Creation callback function for type_req
IN start_fn Request start callback function for type_req
IN continue_fn Request continue callback function for type_req
IN free_fn Request free callback function for type_req
IN cancel_fn Request cancel/failure callback function for type_req
OUT type_req MPI created request type for future reference
<...>
The cancel_fn is defined as:
typedef int MPI_Request_cancel_fn(MPI_Comm comm, MPI_Request request,
void *extra_state);
This callback function is invoked by MPI when the request is cancelled (via
a call to MPI_CANCEL) or when there is a failure in callback function
create_fn, start_fn, or continue_fn (failure is described below in section
3.2.3). This callback function can deallocate the internal data structures
and cancel the internal messages of the request (these are accessible via
extra_state).
<Additions to section 3.2.3>
When callback functions create_fn, start_fn, or continue_fn fail, MPI invokes
callback function cancel_fn. When cancel_fn returns, MPI terminates the
request. The cancel_fn should cancel all internal messages and free all
internal data structures associated with the request.
Discussion:
Should a failure in free_fn invoke cancel_fn?
This is not in the current proposal. The "request-free" and
"cancel/failure" operations are currently unrelated.
Can cancel_fn fail? If so, what happens?
I don't have any bright ideas here.
Should MPI invoke free_fn immediately after cancel_fn for all dynamic memory
allocation?
This might avoid duplicating code for deallocation in both callback
functions at the expense of some flexibility (the "request-free" and
"cancel/failure" operations would no longer be orthogonal).
Steve Lederman raised the following points:
Should there be a default cancel_fn? This would allow a user who does not
care what happens during a CANCEL to avoid writing a cancel_fn. The default
behavior might be to call MPI_ABORT(). Another default behavior might be to
cancel all registered requests. The counter-argument here is that any user
who is using generalized requests should be an "expert" user anyway.
What request is passed to cancel_fn? There are two possibilities: a
request from a call to a user function that returns failure or the generalized
request. It is possible that both may be needed to avoid additional
overhead. Could this be fixed by adding the "id" argument (as in
continue_fn)? This is not clear to me. I'll leave it to Steve to explain the
details here...