MPIRT instrument update

Anna Rounbehler (anna@sky.com)
Wed, 11 Sep 1996 15:26:25 -0400

The following is the updated MPIRT instruments proposal. The goal was to
generate a generic approach to monitoring, leaving details to the implementors.
The changes include:

1. MPI changed to MPIRT
2. Level of detail about what should be monitored was
deferred to "advice to implementors" and dramatically shortened.
3. the call MPIRT_MONITOR_OUTPUT was added
4. the data structures for the monitorhandle were changed to
an array of flags with details left to the implementors.

Advice to implementors will change as the chapter progresses. The current
information is a skeleton of some basics.

Section 8.3 also needs more information as MPIRT functions progress.

Real Time Instrumentation Proposal
-------------------------------------
Section 8.3
Run time instruments provide information about MPI RT specific calls.

8.3.1 MPI RT Monitoring

Runtime instruments will be defined by a start and end statement in a section
of code. This allows a snapshot of any size. Some inaccuracies will
occur when MPI monitoring is turned off and some communications have not completed. The output format is decoupled from the MPI binding and left
to the implementor.

MPIRT_START_MONITOR(request, monitorhandle, f)

For each option set in the monitorhandle, a flag is set and the
corresponding MPI function begins collecting the appropriate
information.

MPIRT_END_MONITOR(request, monitorhandle)

All flags set by MPI_START_MONITOR are reset to the defaults.

MPIRT_OUTPUT_MONITOR(request, monitorhandle, f)

This function is optional and is supported on systems that do
not provide a mechanism for outputting the performance
information automatically.

8.3.1.1 MPI Monitoring Parameters

monitorhandle.profile = (1 0) monitor
monitorhandle.other = (1 0) monitor other services
monitorhandle.option = array of flags indicating
monitoring perferences

8.3.2 Monitoring Other Software Services

In addition to the MPIRT instruments described in this section, an option
for distinguishing other services related to MPIRT is available.

MPI calls are supported with software services. For each MPI implementation,
there is a software layer between the hardware and the MPI application layer
bindings. This layer may differ for each architecture and may consists of
operating system services, vendor specific MPI services and/or other
services. The overhead from software services with respect to an MPI
call may be of interest to the user. Some systems are not capable of
accurately collecting timing information for these services. Implementations
that can not support this option, return a warning message.

This option can provide a level of granularity to the MPIRT instruments
defined in this section. Typical information derived from monitoring other
services might be the time spent in operating system calls. These operating
system calls would be confined to those that support MPIRT.

Advice to Implementors
-------------------------
Examples of information of interest for MPI performance measures may include
execution times, timeouts, counts and workloads.

1. Execution times

Total MPIRT execution time is time spent executing services for MPIRT from
a defined start point to a defined end point.

execution time = (end time - start time)

Some general categories for MPIRT execution times may include time
spent in a channel operation, a collective operation or a communicator group.

2. Counts

The number of processes, number of times an MPIRT communication is completed
and the number of timeouts can be correlated with execution times
to measure performance.

Timeout Counts are confined to timeouts of MPIRT communications,
and are extensible to collective operations and communicator groups.

Collective operations counts may include a count of the number of
processes participating and the number of MPIRT communications completed

Communicator group counts may include a count of the number of MPIRT
communications completed, total number MPIRTcommunication timeouts
and the number of MPIRT collective operations completed

3. Workload

Workload is a measure of the message sizes and traffic load over time
and may defined per communication group

Workloads are a function of message traffic over time. To develop
metrics for workload, the following information may be collected per
communicator:

message sizes = (large small)
frequency large = number of large messages
frequency small = number of small messages

Time is determined by the start and end time in the snapshot created by
MPIRT_MONITOR_START and MPIRT_MONITOR_END.