Event Callbacks Proposal

Greg Henley (henley@ERC.MsState.Edu)
Fri, 25 Aug 1995 16:25:01 -0500

Here's a proposal for a feature which would be useful for profiling
tools and maybe some other purposes.

----------------------------------------------------------------------
Event Callbacks Proposal (internal implementation events)

This proposal provides a means to allow a profiling library to
register callbacks for use when a process is about to begin and end
an event internal to an implementation. This would allow event
trace records to be generated for "idle" periods, among other
things.

MPI_PROFILE_EVENT ( event_type, begin_event_function, end_event_function )
IN event_type specifies type of event for these
callbacks (e.g., "MPI_EVENT_PROCESS_IDLE",
"MPI_EVENT_MESSAGE_ENQUEUE",
"MPI_EVENT_MESSAGE_DEQUEUE")

IN begin_event_function pointer to profiling library function
to be called at the beginning of the
specified event
IN end_event_function pointer to profiling library function
to be called at the end of the specified
event

A NULL pointer could cancel the given callback, allowing it to be
turned on or off for various debug levels.

An example use for this would be for distinguishing between overhead
time and idle time corresponding to receiving a message. This is
currently not possible under the MPI profiling interface. A very
crude approximation could be obtained by probing for the given
message from inside the profiling version of any blocking MPI
routine and issuing a "begin idle" event record if no message was
found, and then after returning from the real MPI routine, issuing
an "end idle" event record. The problem with this crude method is
that the "idle" time includes the receive overhead time.

A user may be able to do optimizations to reduce idle time, but
probably can't do much about overhead time, so distinguishing between
the two is important. Begin and end idle events help answer the
question: "Did the MPI_Recv() take a long time because: a) the message
hadn't been sent, b) the communications network had a high latency, c)
the communications network was slow for the size of message being
sent, etc.?".

A couple of other event types which could be useful would be
"Message_Enqueue" and "Message_Dequeue". These would specify
callbacks for message arrival begin and end events ("Message_Enqueue"
which may take a considerable amount of time if you're running on NOWs
with large messages or congestion), and callbacks for message delivery
begin and end events ("Message_Dequeue"). A profiler could keep track
of how many messages were being held in queue(s), thus allowing a
library writer to better tune code.

The callbacks would depend on the MPI implementation and underlying
communication interface, so some implementations or communication
interfaces might not be able to support some callbacks. The ones
that could (e.g., socket based) would at least have a mechanism to
provide this additional useful information.
----------------------------------------------------------------------

Comments/Discussion?

Greg Henley henley@erc.msstate.edu
Nathan Doss doss@ERC.MsState.Edu