GR proposal

Raja Daoud (raja@tbag.rsn.hp.com)
Tue, 14 Jan 1997 15:32:45 CST

In the spirit of the proposals attempting to simplify GR, this one tries
to capture the essence of how many MPI implementations get their work
done. We think it presents a simpler interface to use and maps better
to implementations.

--Raja & Eric

P.S. As usual, the example code included below may be erroneous and is
probably misleading. Feel free to correct it for the sake of the
poor guys trying to use such functionality.

-=-

Generalized Requests

This proposal attempts to simplify the treatment of generalized requests
by giving them the same flavour as the MPI-1 persistent requests. An
inactive generalized request is created using an initialization routine.
If can then be started, waited on, tested, cancelled, and freed much
like a persistent request, using the related MPI-1 calls. Thus, this
proposal introduces one new function to create a generalized request.

MPI_REQUEST_INIT(start_fn, advance_fn, finish_fn, cancel_fn, req_state, req)

IN start_fn Callback to start the request
IN advance_fn Callback to advance the request
IN finish_fn Callback to finish the request
IN cancel_fn Callback to cancel the request
IN req_state Request state information
OUT req Request handle

This routine creates an inactive generalized request. It accepts four
callback routines. The start_fn and finish_fn routines are called once
by MPI each during the active lifecycle of a request. The advance_fn
routine is called by MPI as many times as necessary to complete the
request. The cancel_fn routine is only called by MPI_CANCEL. The
req_state information is created and destroyed by the user, and
represents the only part of the request that is visible to the user.

int start_fn(void *req_state);

subroutine start_fn(req_state, ierr)
<choice> req_state
integer ierr

MPI calls this routine when the generalized request is started.
It returns MPI_SUCCESS or an error code.

int advance_fn(void *req_state, int *done);

subroutine advance_fn(req_state, done, ierr)
<choice> req_state
integer done, ierr

This routine is repeatedly called by MPI to progress the request until
it is done (i.e. the done flag is set to TRUE). It returns MPI_SUCCESS
or an error code.

int finish_fn(void *req_state, MPI_Status *status);

subroutine finish_fn(req_state, status, ierr)
<choice> req_state
integer status(MPI_STATUS_SIZE), ierr

This routine is called by MPI to post-process the request after it is
done, when the user calls MPI_WAIT (or MPI_TEST). It is also called by
MPI when a request is done after having been freed by the user while in
its active state (i.e. an orphan request). It returns MPI_SUCCESS or
an error code.

int cancel_fn(void *req_state, int *flag);

subroutine cancel_fn(req_state, flag, ierr)
<choice> req_state
integer flag, ierr

This routine is called by MPI_CANCEL. If the request is successfully
cancelled, the flag is set to TRUE. Otherwise it is set to FALSE.
It returns MPI_SUCCESS or an error code.

It is erroneous for the callback routines to block, but they can call
local and non-blocking MPI routines. Since MPI cannot guess whether
generalized requests cover each other (i.e. disable other ones from
advancing in order to maintain some ordering), this issue must be
addressed by the user. If the callbacks use non-blocking MPI calls
exclusively, the ordering guarantee is inherited from MPI. If the
callbacks are used to perform non-blocking operations outside of MPI
(e.g. non-blocking filesystem calls), the ordering of these operations,
if such an ordering is required, must be guaranteed by the user.

Example code:

This code uses the generalized requests to perform a non-blocking file
read. It assumes that a file descriptor is already open for non-
blocking reads, and has a lock associated to it used to guard against
other similar concurrent requests.

struct task {
void *t_buf;
int t_len;
int t_nread;
int t_locked;
};

int file_desc;
lock_t file_lock;

struct task read_task;
char inbuf[1000];

MPI_Request req;
MPI_Status status;

read_task.t_buf = mybuf;
read_task.t_len = sizeof(mybuf);

MPI_Request_init(rd_start, rd_advance, rd_finish, rd_cancel, &read_task, &req);

MPI_Start(&req);
MPI_Wait(&req, &status);

MPI_Request_free(&req);

int
rd_start(struct task *ptask)

{
/*
* Do some error checking.
*/
if (ptask->t_len < 0) return(MPI_ERR_COUNT);
if ((ptask->t_len > 0) && (ptask->t_buf == 0)) return(MPI_ERR_BUFFER);
/*
* Initialize internal data.
*/
ptask->t_nread = 0;
ptask->t_locked = 0;
return(MPI_SUCCESS);
}

int
rd_advance(struct task *ptask, int *done)

{
int n;
/*
* Handle the trivial case.
*/
if (ptask->t_len == 0) {
*done = 1;
return(MPI_SUCCESS);
}

*done = 0;
/*
* Lock the file if not yet owned.
*/
if (ptask->t_locked == 0) {
if (trylock(file_lock) == 0) return(MPI_SUCCESS);
ptask->t_locked = 1;
}
/*
* Read a chunk of data.
*/
n = read(file_desc, ptask->t_buf + ptask->t_nread,
ptask->t_len - ptask->t_nread);

if (n < 0) return(MPI_ERR_OTHER);
/*
* If done or EOF, release the lock.
*/
if (n > 0) ptask->t_nread += n;

if ((n == 0) || (ptask->t_nread == ptask->t_len)) {
*done = 1;
unlock(file_lock);
ptask->t_locked = 0;
}

return(MPI_SUCCESS);
}

int
rd_finish(struct task *ptask, MPI_Status *status)

{
int err = MPI_SUCCESS;

status->MPI_SOURCE = MPI_ANY_SOURCE;
status->MPI_TAG = MPI_ANY_TAG;

if (ptask->t_len >= 0) {
err = MPI_Status_set_count(status, ptask->t_nread, MPI_BYTE);
}

return(err);
}

int
cancel_fn(struct task *ptask, int *flag)

{
/*
* If I have the lock, don't cancel.
*/
if (ptask->t_locked) {
*flag = 0;
} else {
*flag = 1;
ptask->t_len = -1;
}

return(MPI_SUCCESS);
}