More on GRs

Linda Stanberry (linda@anduin.ocf.llnl.gov)
Fri, 17 Jan 1997 16:15:04 -0800 (PST)

Gentle readers,

As threatened, I have some more to say about GRs. I hope you will
forgive any unneeded explanations of MPI-isms in the following, but since
I am still new to MPI, I explicitly write out what may be obvious. I also
hope you will correct any misinterpretations. In any event, don't take
offense at either of these, please :-)

First, I realize the MPIF has just been attacked for being too complex
and there is a drive to simplify things. This is a laudable goal, but
should not be the only goal. Any API should be simple enough to master,
and powerful enough to not have to invent oft-needed constructs from
the elements given. Obviously, there needs to be a balance between the
complexity of mastering an API by any individual and the complexity of
many individuals duplicating effort to reinvent unprovided API packages.
This is a tradeoff between short-term simplicity and long-term simplicity,
in my humble opinion.

I've tried to express that one of my biggest problems with the GR proposal
is determining what is the goal of defining it. If more MPI functionality
is desirable, it is best to make that functionality as generic as possible
and not specialized to any one purpose. I see GRs as the most powerful
tool in MPI if it truly lets users define their own request types. If
so, it needs to be designed to coexist with all existent MPI request
types, and not just some (e.g., should support both persistent and
nonpersistent requests). The rest of my remarks are based on the
assumption that GRs are providing extensible request types. If this is
NOT the goal of GRs, then I would appreciate it if someone would clarify
the goal for me.

Outlined below is what I would like to see in a GR proposal. There are
two versions. The first is somewhat simpler, in terms of number of new
functions required. The second costs one additional function but allows
users to create 'types' of new requests and to reference them by name,
which changes the interface slightly to the other functions. I have a
mild preference for version 2, but I've already implemented (i.e., layered
on top of mpich) something very close to version 1, for the purpose of
implementing nonblocking MPI-IO.

BOTH versions have implications for:

* The 'state' of a request; some means will be necessary to determine if
a given request is a 'generalized' request so that MPI can perform the
semantics for GRs in addition to or instead of ordinary MPI_Request
semantics, and the callback functions must be accessible through
an MPI_Request handle. I am assuming that a request's state indicates
whether or not it is persistent, active, and complete. It will also
have to indicate whether or not it is a GR.

* The 'progress engine' of MPI must be able to invoke a user-defined
progress function, the push_fn callback; I am also assuming it is this
progress engine that is capable of marking requests as complete, whereas
that user-defined callback functions cannot do this.

* Any existent MPI function that takes an MPI_Request argument. In
particular, MPI_START*, MPI_TEST*, MPI_WAIT*, MPI_CANCEL, and
MPI_REQUEST_FREE, must be revised to invoke appropriate callback
functions defined by a user. I'm relatively new to MPI, so I've
probably overlooked some functions in this category.

Both versions are slightly simplified from the current chapter 7, mostly
in not requiring an MPI_GR_MARK_COMPLETE, but there are some other
differences as well.

===========
Proposal 1:
===========

Allow a user to create a nonblocking, persistent or nonpersistent, GR.
Note that a blocking GR is implemented by starting a nonblocking GR and
immediately invoking MPI_Wait.

MPI_GR_INIT(start_fn, push_fn, complete_fn, cancel_fn, extra_state, gr) -
creates a persistent GR with the given callback functions. The state of
the object whose handle is the returned MPI_Request GR will reflect that
the request is persistent, inactive, incomplete, and that it is a GR.
The request can be activated with MPI_Start as with predefined persistent
requests.

MPI_GR_START(start_fn, push_fn, complete_fn, cancel_fn, extra_state, gr) -
creates and starts a nonpersistent GR with the given callback functions.
This invokes the start_fn, which shall be completed before MPI_GR_START
returns. The state of the created MPI_Request object will reflect that
the request is nonpersistent, active, incomplete, a GR, and that its
associated push_fn must be repeatedly called by the MPI progress engine
until push_fn returns true in flag. It is completed with a call to
MPI_TEST/WAIT, or it may be cancelled with MPI_CANCEL.

MPI_START(request) - if request is a GR, this invokes the GR's start_fn,
which shall be completed before MPI_START returns. As in MPI_GR_START,
the state of the request object will reflect that the request is active,
incomplete, a GR, and that its associated push_fn must be executed, and
it is completed by MPI_TEST/WAIT or MPI_CANCEL.

MPI progress engine - must have some record of active, incomplete requests
to monitor; for each GR, the GR's push_fn is invoked repeatedly until its
push_fn returns true in flag, at which time the GR's request state is
marked as complete.

MPI_CANCEL(request) - if the request is a GR, and the request is not marked
as complete, this invokes the GR's cancel_fn, which must complete before
MPI_CANCEL returns. If the cancel_fn returns successfully, the status
of the associated request is marked to indicate that the request was
cancelled (for MPI_TEST_CANCELLED); otherwise, the communication may
have (partially) completed. The request state is marked as complete.
(I think MPI_CANCEL does this last part anyway; that is, I believe that
MPI_TEST/WAIT is called/callable after MPI_CANCEL?)

MPI_WAIT(request, status) - if the request is a GR, blocks until the request
state is marked as complete, and then invokes the GR's complete_fn in
order to determine the status to return from MPI_TEST. If the GR is
nonpersistent, the request handle is reset to MPI_REQUEST_NULL and the
request object is deallocated. If the GR is persistent, it is marked as
inactive but not deallocated.

MPI_TEST(request, status, flag) - if the request is a GR, if the request
state is marked as incomplete, returns false in flag; otherwise, behaves
as indicated for MPI_WAIT, and returns true in flag.

MPI_REQUEST_FREE(request) - in addition to resetting the request handle,
if request is a GR, then this routine must block until the request is
marked as complete, and invoke the GR's complete_fn or cancel_fn, as
determined from the request's status, and then deallocate the request
object. Note that the user has no way of retrieving the status (i.e.,
MPI_TEST/WAIT cannot be called), but the communication must be allowed
to complete as with any other kind of request. Note that this must be
called for a persistent GR, but it may be called for a nonpersistent GR
if no explicit reaping of the request status is desired, as for other
MPI nonblocking requests.

(Note that the collective/selective versions of MPI_START/TEST/WAIT would
be extensions of the semantics defined here as appropriate.)

The callback functions are the same as in the current version of chapter 7,
except as indicated above (i.e., where/how called). Briefly, for review:

start_fn - initializes the user-defined communication; called by MPI_START
or MPI_GR_START.

push_fn - interrogates the status of the user-defined communication and
returns indication of whether or not the communication is 'done'; called
by the progress engine, and/or by MPI_TEST/WAIT.

complete_fn - provides the return 'status' of a completed request and
freeing of resources used by the communication; called by MPI_TEST/WAIT
or MPI_REQUEST_FREE.

cancel_fn - needs to duplicate some of what the complete_fn does, in order
to free any resources held, after halting any ongoing communication. If
any part of the communication has potentially completed, should return a
failure; should return success only if no part of the communication has been
completed. Called by MPI_CANCEL or MPI_REQUEST_FREE. I also think there
is merit in calling this function from MPI_START or MPI_GR_START, the
progression engine, or MPI_TEST/WAIT, if any of the callbacks that they
invoke produce failure.

===========
Proposal 2:
===========

This is the same as proposal 1, except it includes a kind of typedef
mechanism which can be used to package user-defined request types an
hence, to reuse a 'type'.

MPI_GR_DEFINE(start_fn, push_fn, complete_fn, cancel_fn, gr_template) -
creates a template for a set of GRs that will use the same user-defined
callback routines.

MPI_GR_INIT(gr_template, state, gr) - creates a persistent GR from the given
gr_template, as in proposal 1.

MPI_GR_START(gr_template, state, gr) - creates a nonpersistent GR from the
given gr_template, as in proposal 1.

If desired, I will work with Steve and others on wordsmithing for chapter 7,
for whatever proposal is adopted. I do think that the implications of the
callback functions for the rest of MPI may require that revised specs for
the affected APIs should be produced as this will catch any oversights
in our own specifications. But I don't know if that would be consistent
with the MPIF policy.

Linda
lstanberry@llnl.gov