A few assumptions are made:
collective operations are always a function of point to point
communication (convenient for first draft)
Blocked and non-blocked point to point communications
are grouped together.
Performance metrics generated with this monitoring is
applicable to an implementation for an architecture
and not necessarily comparable otherwise.
Real Time Instrumentation Proposal
-------------------------------------
Section 8.3
1. Delete the paragraph starting with "We defer specification
of how data is ....."
2. Delete Events:
Replace with the following:
Run time instruments provide information about MPI RT specific calls.
The information of interest for MPI performance measures includes:
1. Execution times
Execution times are computed from a start time and end time. Listed below
are the general categories for MPI Real Time execution times
and the items to be reported.
Execution time of point to point communication
report total execution time
Execution time for a collective operation
report total execution time
The total execution time for a collective operation can be
related to the type of collective operation vs. the number
of processors involved.
Execution time for communicator group
report total execution time
The total execution time for the communicator is used
to determine usage of a group over time.
2. Counts
The number of processes, number of times a communication is completed
and the number of timeouts can be correlated with execution times
to measure performance.
3. Workload
Workload is a measure of the message sizes and traffic load over time
for a specific communication group.
8.3.1 MPI RT Monitoring
Runtime instruments will be defined by a start and end statement in a section
of code. This allows a snapshot of any size. Some inaccuracies will
occur when MPI monitoring is turned off and some communications have not completed. The output format is decoupled from the MPI binding and left
to the implementor.
MPI_START_MONITOR(request, monitorhandle, f)
For each option set in the monitorhandle, a flag is set and the
corresponding MPI function begins collecting the appropriate
information.
MPI_END_MONITOR(request, monitorhandle)
All flags set by MPI_START_MONITOR are reset to the defaults.
8.3.1.1 MPI Monitoring Parameters
monitorhandle.profile = (1 0) monitor
monitorhandle.other = (1 0) monitor other services
monitorhandle.pttopt = (1 0) monitor point to point comm
monitorhandle.coll = (1 0) monitor collective operations
monitorhandle.comm = (1 0) monitor communication for communicator
monitorhandle.msgsize = (1 0) monitor message size
monitorhandle.traffic = (1 0) monitor message traffic load
8.3.2 Instruments for MPI Monitoring
The two major instruments for monitoring are execution times and
workloads. Counts provide supportive information for performance
measures.
8.3.2.1 Monitoring MPI Execution Times
Total MPI execution time is time spent executing services for MPI from
a defined start point to a defined end point.
execution time = end time - start time
For real time systems, the following rules apply to start and end times.
Start Times
-----------
Point to point communication
Event Models
start time = MPI event is woken up
Priority Models
start time = MPI priority task is woken up
Time Models
start time = actual start time within the
bounds of timehandle.Start_Time
and timehandle.Timeout.
Collective operations
start time = start time of first MPI point to point
communication *
Communicator Group
start time = start time of first MPI communication*
If the task is time driven and has a priority, then the rules that apply
to time models take precedence.
* point to point send operation ( can be related to a collective operation)
End Times
---------
Point to point communication
end time = return from call that completes the communication
Collective operations
end time = time of last MPI point to point communication
end time.
Communicator Group
end time = time of last MPI point to point communication
end time
8.3.2.1.1 Counts
1. Timeout Counts
These are confined to timeouts of point to point communications,
and are extensible to collective operations and communicator groups.
2. Collective operations counts
number of processes
number of point to point MPI communications completed
3. Communicator group counts
number of point to point MPI communications completed
total number of point to point MPI communication timeouts
number of MPI collective operations completed
8.3.2.2 Monitoring MPI Workloads
Workloads are a function of message traffic over time. To develop
metrics for workload, the following information is collected per
communicator:
message sizes = (large small)
frequency large = number of large messages
frequency small = number of small messages
Time is determined by the start and end time in the snapshot created by
MPI_MONITOR_START and MPI_MONITOR_END.
8.3.3 Monitoring Other Software Services
In addition to the MPI instruments described in this section, an option
for distinguishing other services related to MPI is available.
MPI calls are supported with software services. For each MPI implementation,
there is a software layer between the hardware and the MPI application layer
bindings. This layer may differ for each architecture and may consists of
operating system services, vendor specific MPI services and/or other
services. The overhead from software services with respect to an MPI
call may be of interest to the user. Some systems are not capable of
accurately collecting timing information for these services. Implementations
that can not support this option, return a warning message.
This option can provide a level of granularity to the MPI instruments
defined in this section. Typical information derived from monitoring other
services might be the time spent in operating system calls. These operating
system calls would be confined to those that support MPI. Refer to advice
to implementors.