Re: generalized request

Parkson Wong (parkson@nas.nasa.gov)
Mon, 23 Sep 1996 09:46:46 -0700 (PDT)

>
> I want to make sure I understand the desired functionality. I
> interprete this to mean that you need external checking to know when
> progress has occured. If you simply call Wait/Test it will not finish
> because the request will not be marked as done. I think we are
> talking about a capability that is more than GR (generalized request)
> does right now. Currently, it is assumed that MPI can progress the
> request. This means that through its own polling (with threads) or
> when an MPI function is called, MPI causes the request to make
> progress (if it can). Here it seems you want progress outside of MPI
> to happen. If this is the case, why not create a new user function
> that is io_test. In io_test you first do what you need to do to cause
> progress (may check check for done), do a MARK_Complete and then call
> MPI_Test?

This is actually not really a problem of "causing progress", MPI or MPI's I/O
specification will define whatever progress rule is necessary, and the implementation
will adhere to that. The problem here is with notification, if the system has
no asynchronize notification capability, it is hard to implement other than
spinning a thread and poll. MPI does not require asynchronize notification,
user application is required to do test/wait to determine if the operation has
completed. It would be nice that we won't have to require asynchronous
notification to make the use of generalized request work.

May be, I will try an example:

In the proposal as is:

MPI_IWrite(...)
{
create generalized request

start I/O

return (MPI_SUCESS);
}

Somewhere else in the implementation code:

IO_complete()
{
mark request complete
}

So somehow, the IO_complete routine need to be called when IO is done.
This requires asynchronous event notification, or spinning a thread to do
the poll.

In the new proposal:

Same code in MPI_IWrite,

MPI_Test(..)
{
foreach request {
if (generalized request) {
(generalized_request.test)();
}
}
}

The test/wait function in MPI will invoke the test and wait function in
the generalized request to do the test/wait on that particular request.
Asynchronize notification is not needed here.

>
> >
> > 2) How to return error is not very well defined. Is there an error field in
> > MPI_Status that the user can get at?
>
> I can clarify the text. Each MPI_Status object has a non-opaque field
> that holds the error information. It is accessed via status.MPI_ERROR in
> C and STATUS(MPI_ERROR) in Fortran.
>
> >
> > 3) May be it is close enough to the final draft that I can talk a little
> > about binding issues. There are these 6 calls in the draft
> >
> > MPI_Set_request_tag(request, tag)
> > MPI_Set_request_source(request, soruce)
> > MPI_Set_request_error(request, error)
> > MPI_Set_request_count(request, count, datatype)
> >
> > MPI_Set_status_count(status, count, datatype)
> >
> > MPI_Request_mark_complete(request)
> >
> > It is probably preferable to have just this 3 function
> >
> > MPI_Set_status_error(status, error)
> > MPI_Set_status_count(status, count, datatype)
> >
> > MPI_Request_mark_complete(request, status)
>
> The fundamental difference seems to be between associating the info
> with the request or the status. With the proposed system you save 4
> functions (you don't need MPI_Set_status_error since error is not
> opaque). The tradeoff is that you need to have a status at the
> MPI_Request_mark_complete. When you do a Test/Wait, what is put in
> the supplied status? I assume it is the same info that was provided
> in the status at the MPI_Request_mark_complete. I think the current
> design is cleaner since you only have 1 status. You need to pass or
> create a status object where ever MPI_Request_mark_complete is called
> in the new design. I am happy to hear counter arguments.

When you do a test/wait, if the request is completed, the status
supplied by MPI_Request_mark_complete is request. There is only one status.

The only reason for discussing binding is for convenience. The
routine that uses these facilities probably has an internal date structure
that holds these information and sets them up, so it might as will keep
it the data structure called MPI_Status, and pass it in 1 call instead
of doing 4 calls.

As I have said, the functionality is equivalent, and it is just
a binding issue.

>
> >
> > 4) I also argue that you will need MPI_Status_create and MPI_Status_free
> > for layerability.
>
> I don't think you need such functions. You can create a status object
> by mallocing the MPI_Status structure or with static declaration if
> you are in Fortran and cannot malloc. You do something like:
>
> status = (MPI_Status)malloc((size_t)(sizeof(MPI_Status))

I thought that MPI_Status is a semi-opaque object, if you allow me
to do a malloc and use it, than it is not an opaque object, for example the
implementation cannot put magic number in it. I don't mind if status is
not an opaque object, it just have to be defined that way.

-- parkson

-- 
Parkson Wong			Address: Numerical Aerodynamic Simulation
MRJ, Inc.				 NASA Ames Research Center M/S 258-6
Supercomputer Applications Segment	 Moffett Field, CA  94035-1000
e-mail: parkson@nas.nasa.gov	Phone: (415)604-3988	Fax: (415)966-8669