Another needed refinement. If the user passed to Wait
MPI_STATUS_NOT_NEEDED (or matchmacallit), then we want to pass this to
the complete_fn function, rather than a real status. The complete_fn
function should test for this special value.
Now, do we allow calls to MPI_REQUEST_FREE on generalized requests? If
so, complete_fn should perhaps also be invoked by MPI_REQUEST_FREE, with
status = MPI_STATUS_NOT_NEEDED. But (more serious issue), does this
mean that complete_fn could be invoked before the generalized request
completed? In this case, cleanup has to be done by the last invocation
of progress_fn. So complete_fn needs to know that it was invoked by
MPI_REQUEST_FREE, on a request that has not completed (the "last"
argument has now three possible values). And the user should be warned
that it needs to write his callback functions to take care of this last
case. Or, perhaps, we need a "request_free" callback (this is really a
cosmetic choice between several callbacks, or one callback that brancehs
according to the value of an input flag).
Violent pain prevents me from continuing.