Re: comments of Steve

Marc Snir (snir@cs.nyu.edu)
Wed, 29 Jan 1997 16:36:12 -0500

This is a multi-part message in MIME format.

--------------33921E4356E9
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit

>
> The count and array_of_request is very much like a much earlier GR
> proposal. Ultimetly we gave up on this interface. The main headache
> I remember is that you start with an array_of_requests and each time
> the callback completes it can give a new list of array_of_requests.
> What is the meaning of the index passed to the callback of the
> completed request?

The index refers to the current list, that is associated with the
gneralized request when progress_fn is called. It is up to the user to
"remember" in extra_state which list of requests the generalized request
is currently attached to.

> I would like to
> throw out an alternative. The current proposal has two modes:
> progress when a request completes and external to MPI. The former is
> what is being considered here. These are basically handlers. Would
> it make more sense to attach a handler to request and use the
> extra_state to allow the user to sort as needed? In this senario the
> progress_fn goes away. Also, MPI_GR_CONTINUE is gone since you simply
> start up a handler whenever you want. The only requirement is that
> MPI_GR_COMPLETE must get called somewhere. The issue of allowing GRs
> on the progress list becomes one of deciding if a handler can be
> attached to a GR.
>

Let's work this out in more detail.
I thing this is fairly close to one of the options in my text.

MPI_GR_START(request, progress_fn, complete_fn, cancel_fn, gen_request)
IN IN IN IN OUT

The names are misleading, so let's explain: the function progress_fn is
invoked ONCE, when "request" completes. (This is, if I understand
correctly, the main difference between what Steve proposes and my
writeup: progress_fn does not return a new request or a new array of
requests, and is called only once.) We still need complete_fn: this is
the function that is invoked when the user calls WAIT or TEST, after the
gen_request completed; this is the function that stuffes the status
object and determines the error code returned by WAIT/TEST. And
cancel_fn is needed, too: this is the function that is called when the
user invokes MPI_CANCEL(gen_request).

Rather than attaching a handler to a request, I assume that I create a
new (generalized) request. This avoids some complications -- it's
always simpler to create new objects rather than modify existing
objects. We require that each request is associated with a unique
generalized request, so that there is no real difference in performance
or internals.

"request" may be MPI_REQUEST_NULL, in which case progress is external to
MPI. We then use MPI_MARK_COMPLETE to poke the gen_request.

What do we loose vs the full (array_of_requests) proposal?: the ability
to have a "select" statement, where progress occurs when one out of list
of events occurred. On the other hand, we still have the ability to
chain communications or multiple events. Consider an operation that
consists of the following

a. start asynchronous I/O read
b. wait for an asynchronous I/O read to complete
c. send message; send destination depends on I/O
d. wait for answer

We want to create a gen_req for that compound operation.

Then one will do the following.

1 Start a generalized request GR with request=MPI_REQUEST_NULL,
extra-state = ES. Stuff the GR handle in ES. This is the gen_req for
the entire operation.

2. start gen-req GR1 with request=MPI_REQUEST_NULL, and extra_state =
ES. This is the gen_req for the I/O. An external I/O handler calls
MPI_MARK_COMPLETE(GR1) when the I/O operation completes and stuffs the
destination in heap variable ES.

3. Start the I/O read

4. start GR2 with request=GR1, and extra_state=ES. The handler for this
gen_req (which is triggered when the I/O completes) does the following
a. send
b. posts a nonblocking receive, associated with recv_req.
c. start GR3 with request = rec_req. This is then gen_req for the
receive. The handler for the gen_req (which is triggered when the
receive completed) will call MPI_MARK_COMPLETE(GR), to signal completion
of the entire operation. It can retrieve the GR handle from the
extra_state ES.

This is doable, but exceedingly ugly. I would prefere a simpler scheme
for chaining operations, i.e., a progress handler that can be invoked
multiple times. If progress_fn returns a request, which is the next
trigger for a subsequent callback, then the code becomes.

1. Start a generalized request GR with request=MPI_REQUEST_NULL, and
extra_state=ES. Stuff GR in ES.
2. start the I/O read. An external handler for it will, when the I/O
complete,
a. send
b. post a nonblocking receive, associated with recv_req
c. call MPI_GR_CONTINUE(GR, ES, recv_req), to associate the gen_req
with the completion of the receive.

I think this is much easier to understand.

So, the debate should focus on two issues:

i) do we need a select statement, i.e. an array_of_requests, rathern
than one request.
ii) do we need to chain event and, if so, how easy on the user do we
want this to be.


> There was an alternative discussion of how to handle cancelling which
> Marc was not present for. I will send that out soon. The issue of
> cancel seems mostly orthogonal to this big decision.
>
> Marc correctly raised the point of calling complete_fn from
> MPI_REQUEST_GET_STATUS. This raises the ugly point that complete_fn
> can be called multiple times for the same request. (It was already
> true because of MPI_TEST but we never discussed it.) At a minimum we
> should note this.

MPI_TEST does no cause the problem of calling multiple time complete_fn,
since complete_fn is called only when test returns flag=true. Note that
MPI knows whether the request completed without calling complete_fn --
this information was returned by the last call to progress_fn, or by
MPI_REQUEST_COMPLETE, depending on the proposal we adapt. On the other
hand, GET_STATUS does present a problem since it is supposed to be a
nondestructive access to the status of a request. This does not sit
well with the assumption that the status is computed by complete_fn.

What are the choices?

a. Accept this ugliness. Users that may use GET_STAUS should make sure
that calls to complete_fn are idenpotent (multiple calls have the same
effect as a single call). But, to do this, it means that there must
some way of passing information from one invocation of complete_fn to
the next. One can do it with extra_state, e.g., set a flag to indicate
that complete_fn has already been called once, and keep in extra_state
enough information to complete the status a second time. But then
extra_state cannot be (totally) freed by complete_fn, and we have a
memory leak.

b. Presumably, MPI implementations have a record associated with request
objects that hold information later returned in the user provided
status. We could provide functions that allow to stuff data in this
"internal status" record. Somthing on the lines of
MPI_REQUEST_SET_STATUS(request, status), where request is set so then
when subsequently MPI_REQUEST(requset, status,flag), or WAIT(request,
status) are invoked, the status returned is identical to the status set
by MPI_REQUEST_SET_STATUS. Ugly.

c. The two reasones I can remember for MPI_GET_STATUS are (i) passing
status objects across languages, and (ii) accessing the status of a
request insided of handler. If we (i) provide language conversion calls
for status objects and (ii) get rid of handlers, then perhaps we do not
need MPI_REQUEST_GET_STATUS.

>
> I am not clear how MPI_GR_LOCK/UNLOCK work. The text says you should
> not call them from inside the progess_fn. How then can you protect
> critical sections in the progress_fn?

The implementation (in a single mpi thread) environment should guarantee
that, when progress_fn executes, there will be no "context switching" to
the main user program or to another progress function. The former is
not a real issue -- I cannot see how a signle threaded implementation
would pop back to the user code in the middle of a callback. The later
is a (middle) issue: futher callback invocations need be disabled when
one callback executes -- e.g., by setting a flag. This same falg is
set, from the main user code, by a call to MPI_GR_LOCK.

>
> Steve
>

Marc

--------------33921E4356E9
Content-Type: text/html; charset=us-ascii
Content-Transfer-Encoding: 7bit
Content-Disposition: inline; filename="0288.html"

<BASE HREF="http://parallel.nas.nasa.gov/MPI-2/mpi-external/0288.html">

<!-- received="Wed Jan 29 06:23:49 1997 PST" -->
<!-- sent="Wed, 29 Jan 1997 08:11:07 -0600 (CST)" -->
<!-- name="Steve Huss-Lederman" -->
<!-- email="lederman@cs.wisc.edu" -->
<!-- subject="comments on Marc's proposal" -->
<!-- id="199701291411.IAA14537@rap.cs.wisc.edu" -->
<!-- inreplyto="" -->
<title>mpi-external Mailing List Archive: comments on Marc's proposal</title>
<h1>comments on Marc's proposal</h1>
Steve Huss-Lederman (<i>lederman@cs.wisc.edu</i>)<br>
<i>Wed, 29 Jan 1997 08:11:07 -0600 (CST)</i>
<p>
<ul>
<li> <b>Messages sorted by:</b> <a href="index.html#288">[ date ]</a><a href="thread.html#288">[ thread ]</a><a href="subject.html#288">[ subject ]</a><a href="author.html#288">[ author ]</a>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="0289.html">Marc Snir: "generalized requests -- cancel"</a>
<li> <b>Previous message:</b> <a href="0287.html">Marc Snir: "(no subject)"</a>
<!-- nextthread="start" -->
</ul>
<!-- body="start" -->
I am going to make some comments about Marc's proposal. I want to<br>
emphasize that if we do not have a coherent proposa in the next draft<br>
of the document then I fear that GR will not make it into MPI-2.<br>
Thus, let's keep to discussion going! I am currently working on a<br>
rewrite of the external chapter. I hope to have it out in the near<br>
future.<br>
<p>
After sitting through many discussions on GR with implementors, I<br>
think the polling based design (7.2.2 aternative of Marc's writeup and<br>
the current one in document) is unlikely to get the support of<br>
everyone. The new proposal (7.1) is based on limited discussions at<br>
the meeting but was at least well received in the basic idea. Thus, I<br>
agree with Marc that we need to flush it out.<br>
<p>
The count and array_of_request is very much like a much earlier GR<br>
proposal. Ultimetly we gave up on this interface. The main headache<br>
I remember is that you start with an array_of_requests and each time<br>
the callback completes it can give a new list of array_of_requests.<br>
What is the meaning of the index passed to the callback of the<br>
completed request? Is it some master index, or an index on one of the<br>
lists in which case it is hard to know which list. I would like to<br>
throw out an alternative. The current proposal has two modes:<br>
progress when a request completes and external to MPI. The former is<br>
what is being considered here. These are basically handlers. Would<br>
it make more sense to attach a handler to request and use the<br>
extra_state to allow the user to sort as needed? In this senario the<br>
progress_fn goes away. Also, MPI_GR_CONTINUE is gone since you simply<br>
start up a handler whenever you want. The only requirement is that<br>
MPI_GR_COMPLETE must get called somewhere. The issue of allowing GRs<br>
on the progress list becomes one of deciding if a handler can be<br>
attached to a GR.<br>
<p>
There was an alternative discussion of how to handle cancelling which<br>
Marc was not present for. I will send that out soon. The issue of<br>
cancel seems mostly orthogonal to this big decision.<br>
<p>
Marc correctly raised the point of calling complete_fn from<br>
MPI_REQUEST_GET_STATUS. This raises the ugly point that complete_fn<br>
can be called multiple times for the same request. (It was already<br>
true because of MPI_TEST but we never discussed it.) At a minimum we<br>
should note this.<br>
<p>
I am not clear how MPI_GR_LOCK/UNLOCK work. The text says you should<br>
not call them from inside the progess_fn. How then can you protect<br>
critical sections in the progress_fn?<br>
<p>
Steve<br>
<!-- body="end" -->
<p>
<ul>
<!-- next="start" -->
<li> <b>Next message:</b> <a href="0289.html">Marc Snir: "generalized requests -- cancel"</a>
<li> <b>Previous message:</b> <a href="0287.html">Marc Snir: "(no subject)"</a>
<!-- nextthread="start" -->
</ul>

--------------33921E4356E9--