MPIRT monitoring

Dennis Cottel (dennis@rats-b.nosc.mil)
Thu, 12 Sep 96 08:34:38 PDT

This note is in response to the instrumentation proposal in the current
MPIRT document. We discussed it generally at a recent MPIRT meeting
but I would like to make my comments more specific.

First, it is not clear to me whether a standard should specify this
kind of functionality. Normally, I would think not, but since
performance is presumably such an integral part of real-time programs, I
imagine you could argue that any real-time program will have calls to
similar functionality in its code, so therefore, standardizing the
interface will improve program portability. I will leave this issue to
be debated later.

I believe that the current approach is too specific, and therefore
inflexible and difficult to extend. Since we cannot hope to anticipate
all possible things that users or implementors will want to monitor, the
interface must be defined in a way that is easily extensible. The
interface should specify *when* the characteristics to be monitored are
identified, maybe *how* they are processed, but not *what* they are.
And having implemented such a capability, we should also allow users to
use and extend it for their own measurements.

I will illustrate what I mean with the example of the monitoring
interface for our communication library which has worked well for us for
a couple years. When users decide to monitor a portion of their code,
they bracket the code with Monitor_on/Monintor_off calls tagged with a
string of their own choosing. Statistics are gathered whenever this
section of code (or any other section with the same tag) is executed.
Similarly, as library writers we can monitor any internal code we wish
simply by adding monitor calls with a unique tag. Therefore, this
single interface is extensible for both users and implementors. When
the program terminates, all monitor statistics which were gathered are
printed out, labeled with their tag. [I will include a sample monitor
value printout and the manual pages for the entry points at the end of
this message for anyone who might be interested.]

Our interface doesn't include some other features that have been
discussed. (For instance, Arkady suggests that users should be able to
specify the number of bins in which to sort the measurements.) But my
point is to illustrate that a usable interface can be made very general,
flexible, and extensible.

Dennis Cottel, dennis@nosc.mil, (619) 553-1645
NCCOSC, RDT&E Division (NRaD), San Diego, CA

=========================================================================
An example monitor output from our communications library. Note that we
calculate the number of ops per second if the user specifies an
operations count in the Monitor_off call:

Monitor report (times in seconds):
section times called min max avg tot % Mx/s
-----------------------------------------------------------------------
fft 23 0.092 0.137 0.101 2.319 24.2 34.9
recv(in_cmplx) 23 0.011 1.603 0.095 2.182 22.7 -
send(out_cmplx) 22 0.048 3.183 0.231 5.092 53.1 -

=========================================================================
MONITOR_OFF()

void monitor_off(
const char *section_name,
long num_ops);

Parameters

section_name -- is the name of the section being monitored. Must be
the same as used in the corresponding monitor_on() call for the
section being monitored. section_name must be 31 characters or less.

num_ops -- is the number of operations executed in the section of
code being monitored. The user provides this value based on the
user's knowledge of the algorithms performed in the monitored section
of code. The num_ops argument can be set to zero if the user does
not care about the operations per second statistic for this section
of code.

Description

See the entry for monitor_on() for a description of how this routine
is used.

Errors

The library will terminate and produce an error message if
monitor_on() and monitor_off() are not called in order for a given
section of code.
=========================================================================
MONITOR_ON()

void monitor_on(
const char *section_name);

Parameters

section_name -- is the name of the section being monitored. Must be
the same as used in the corresponding monitor_off() call for the
section being monitored. section_name must be 31 characters or less.

Description

The performance monitoring routines are placed around sections of
code for which the programmer wants performance statistics.
monitor_on() and monitor_off() are placed, respectively, at the
beginning and end of a section of code. The same section_name string
must be supplied to both. When the monitor_on() routine is called,
the time on the hardware clock is recorded for the section of code
that will be monitored. When the corresponding monitor_off() routine
is called (with the same section_name), the hardware clock is read,
and the elapsed time since the monitor_on() routine was called is
computed. The elapsed time is added to a variable keeping track of
accumulated time, and compared to other variables keeping track of
minimum and maximum values. Also recorded are the number of times
each section of code is entered and the number of accumulated
operations performed by each.

The user can view the data recorded by the monitor routines either
interactively during the run, or at the end of the run by turning on
the "show_monitors" report variable for the programs and instances of
interest.

Sections of code surrounded by the monitor_on() and monitor_off()
routines can be embedded within other sections of code being
monitored. Also, different sections of code can use the same
section_name, thus grouping the statistics for those sections.

Errors

The library will terminate and produce an error message if
monitor_on() and monitor_off() are not called in order for a given
section of code.

Example

...
monitor_on("both");

/* 128#pt Forward FFT */
monitor_on("fft");
cfft(buf, 128, 1);
monitor_off("fft",4480); /* 5n*logn = 4480 */

/* 128#pt Inverse FFT */
monitor_on("ifft");
cfft(buf, 128, -1);
monitor_off("ifft",4480);

monitor_off("both",0);
...
=========================================================================