New low-level event section

Jerrell R. Watts (jwatts@aztec.scp.caltech.edu)
Wed, 16 Oct 1996 00:08:00 -0700

--
--PART-BOUNDARY=.19610160008.ZM17820.scp.caltech.edu
Content-Type: text/plain; charset=us-ascii

As I said I would during the RT subcommittee meeting last week, I have revised the low-level event section. As requested by Arkady, those revisions include the addition of an example. The example attempts to illustrate some of the most important issues, namely synchronization between the handler and the process and cancellation of a request on which a handler has been posted. (It also demonstrates the limitations of not having late buffer binding for persistent requests.)

Included below is the LaTeX for the new low-level event section. I didn't actually run it through LaTeX, so I apologize in advance for any bugs.

--Jerrell

-- 

Jerrell R. Watts Scalable Concurrent Programming Laboratory California Institute of Technology jwatts@scp.caltech.edu -- http://www.scp.caltech.edu/~jwatts

--PART-BOUNDARY=.19610160008.ZM17820.scp.caltech.edu X-Zm-Content-Name: rt-event.tex Content-Description: Text Content-Type: text/plain ; name="rt-event.tex" ; charset=us-ascii

\subsection{Low-Level, Event-Driven Real-Time MPI}

In order to implement the high-level, event-driven MPI, and because there are applications that do not require ``global'' events, a low-level specification must also be offered.

\subsubsection{Request handlers}

The low-level mechanism for event-driven {\MPI/}/RT is the request completion/resource release handler, shown below. (NOTE: This mechanism is closely based on the \mpiarg{MPI\_POST\_HANDLER} routine currently in the external interfaces chapter.)

\begin{funcdef}{MPI\_REQUEST\_POST\_HANDLER(request, request\_cond, handler\_fn, failure\_fn, extra\_state, response\_time)} \funcarg{\IN}{request}{{\MPI/} request (handle).} \funcarg{\IN}{request\_cond}{request condition (integer)} \funcarg{\IN}{handler\_fn}{request handler (function)} \funcarg{\IN}{failure\_fn}{failure handler (function)} \funcarg{\IN}{extra\_state}{handler state (pointer)} \funcarg{\IN}{response\_time}{time by which to start handler (TIME\_OBJECT handle)} \end{funcdef}

Once \mpifunc{MPI\_REQUEST\_POST\_HANDLER} has been called, the handler function is to be called by the allotted time after the given request reaches the condition specified by the \mpiarg{request\_cond} argument. When the handler is called, it is passed the given \mpiarg{request}, the status of the request, and the \mpiarg{extra\_state} provided. If the handler cannot be called within the allotted time (in the case of either absolute or relative times) or the specified time has already passed (in the case of absolute times only), then the failure function is called. Like the request handler, the failure routine is passed the \mpiarg{request} argument, that request's status, and the \mpiarg{extra\_state} argument.

There are two exceptional cases for the time argument: First, if \mpiarg{response\_time} is a relative time and its value is zero, then the handler is to be called ``as soon as possible.'' (How soon is an implementation quality issue. The desired goal is an interrupt-like functionality.) If the time is \const{MPI\_TIME\_IGNORE}, the request handler will be called at some later time but not necessarily ``immediately.'' Since an instantaneous response time is not practically achievable in the first case and since the response time is unspecified in the second case, the failure handler will never be called for either case.

If the request has reached the specified condition when the \mpifunc{MPI\_REQUEST\_POST\_HANDLER} call is made, the handler is scheduled for execution (unless the specified time is absolute and has already passed, in which case the failure routine is called). Notice that for normal nonblocking calls, it may often be the case that the request has already completed. In such circumstances the user may wish to use a persistent version of the call generating the request, if it is available. This would allow the handler to be specified before the request is started. (The lack of late binding poses something of a problem, however. See the example given below.)

Note that a request can have only one handler for each of its conditions. If the user wishes to have a callback list for each condition, this must be implemented manually by having a high-level handler that calls the individual handlers one by one. If \mpifunc{MPI\_REQUEST\_POST\_HANDLER} is called for a request and a condition for which a handler has already been specified and the handler has not yet been invoked or the request is a persistent one, then the old handler is replaced by the new handler. If the new handler is a null pointer, then a handler will no longer be called for the specified condition on that request.

The request conditions currently specified are as follows:

\begin{itemize}

\item \mpifunc{MPI\_REQUEST\_COMPLETE} The associated handler is called when the request in question has been marked complete. For example, if the handler is associated with a nonblocking or persistent send, then the handler is called after the send buffer is available for reuse. (Note that the handler may run concurrently with the process if the process was blocked on an \mpifunc{MPI\_WAIT} on the same request at the time the handler was invoked. If the user wishes to avoid this, he/she must provide explicit synchronization.)

\item \mpifunc{MPI\_REQUEST\_RELEASE} The associated handler is called when all resources associated with the last execution of the request are free. Continuing the above example, if the handler is associated with a nonblocking or persistent send, then this handler is called when all local buffers and network resources have been released. (The overall semantics of this condition are admittedly fuzzy. The condition is necessary, however, in order to guarantee real-time performance in certain circumstances. For example, in the case of the above example, one might want the handler to initiate another send request, guaranteeing that the two sends do not contend for system resources.)

\end{itemize}

\discuss{As an alternative to the \mpifunc{MPI\_REQUEST\_RELEASE} condition, we may wish to strengthen the notion of request completion for real-time systems to include the release of all system resources--not just the user buffer.}

The request handler is assumed to be ``full-weight.'' That is, it can execute any {\MPI/} call or system-specific synchronization call and may run for an indeterminate amount of time. (I.e., it is not restricted like a signal handler.) Also, handlers do not implicitly ``consume'' their request(s). The request passed to a handler can still be waited on or freed by the process before or after the handler is called, unless the handler itself explicitly frees the request. A request is not actually freed by {\MPI/}/RT until all of the handlers associated with that request have been called. Even if the process or another handler has called \mpifunc{MPI\_REQUEST\_FREE} on the request prior to the execution of the handler, the request is still valid and can be queried using the appropriate calls. One complication is \mpifunc{MPI\_CANCEL}: If a request is canceled prior to the execution of the handlers, the handlers for each condition are called in turn may note the fact that the request has been cancelled via \mpifunc{MPI\_TEST\_CANCELLED}.

\subsubsection{Example}

The following example demonstrates the use of a request handler. A handler is posted on receive request to add messages to a linked list to be consumed by the local process. The semaphore routines are as per POSIX.1b. (Note: A persistent request could not be used because the buffer cannot be late bound. Thus, the handler must repost itself.)

\begin{verbatim} sem_t wait, lock; list_t buflist; MPI_Request request; MPI_Status status;

void buf_recv(MPI_Request *request, MPI_Status *status, buffer_t *oldbuf) { buffer_t *newbuf; int flag;

/* Check if request was cancelled */ MPI_Test_cancelled(status, &flag); if (!flag) {

/* Free the old request */ MPI_Request_free(request);

/* Add old buffer to the list and prepare to receive a new buffer */ /* (with mutual exclusion) */ sem_wait(&lock); list_append(&buflist, oldbuf); newbuf = (buffer_t *)malloc(sizeof(buffer_t)); MPI_Irecv(newbuf, sizeof(buffer_t), MPI_BYTE, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, request); MPI_Request_Post_Handler(request, MPI_REQUEST_COMPLETE, buf_recv, NULL, newbuf, MPI_TIME_IGNORE); sem_post(&lock); }

/* Signal the process that the handler is done */ sem_post(&wait);

return; }

void main(int argc, char *argv[]) { buffer_t *firstbuf, *nextbuf; int done;

/* Initialize MPI */ MPI_Init(argc, argv);

/* Initialize semaphores */ sem_init(&wait, 0, 0); sem_init(&lock, 0, 1);

/* Create first buffer and post a handler to append it to the list */ firstbuf = (buffer_t *)malloc(sizeof(buffer_t)); MPI_Irecv(firstbuf, sizeof(buffer_t), MPI_BYTE, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &request); MPI_Request_Post_Handler(&request, MPI_REQUEST_COMPLETE, buf_recv, NULL, firstbuf, MPI_TIME_IGNORE);

/* Loop until buf_consume() routine says we're done */ do { /* Wait for a handler to signal a buffer's arrival */ sem_wait(&wait);

/* Remove a buffer from the list (with mutual exclusion) */ sem_wait(&lock); list_pop(&buflist, &nextbuf); sem_post(&lock);

/* Consume and free the buffer */ buf_consume(nextbuf, &done); free(buf); } while (!done);

/* Cancel current request (waiting for handler to post a new request) */ sem_wait(&lock); MPI_Cancel(&request); sem_post(&lock);

/* Wait for handler to complete for the last time */ sem_wait(&wait);

/* Destroy semaphores */ sem_destroy(&wait); sem_destroy(&lock);

/* Finalize MPI */ MPI_Finalize();

exit(0); } \end{verbatim}

--PART-BOUNDARY=.19610160008.ZM17820.scp.caltech.edu--