Suggestion for MPI_Print

John M May (johnmay@coral.llnl.gov)
Wed, 19 Jun 1996 09:26:25 -0700 (PDT)

Many programs use print statements for debugging or for reporting results.
Just as MPI's datatypes are useful for describing data in message passing
and file I/O operations, they could also be useful for rudimentary
formatting of output data. Furthermore, MPI's communicators can be used
to define which processors participate in an output function. This would
allow the system to combine output from multiple nodes into a single line.
Therefore, I would like to propose an MPI_Print function. Here are some
ideas on how it might work:

MPI_PRINT( comm, buf, datatype, count, status )
IN comm [SAME] Set of processes to participate in output
IN buf Address of buffer to be written
IN datatype [SAME] Type of output data
IN count [SAME] Repetition count of type over buffer
OUT status Status information

The communicator not only defines which nodes will participate in the
output but also how data will be combined. The output from each
node will be preceded by its global rank (i.e., its rank in
MPI_COMM_WORLD). However if multiple nodes in the given communicator
output identical data, the data will be merged to a single line, and
the line will be preceded by a specification of the nodes that sent
that data (see example below). The buf parameter specifies the
output buffer in the usual way, and the datatype tells the system
not only where to find the data in the buffer but also what its type
is, much as format specifications like %d and %f do in a C printf
function. The count allows a datatype to be repeated multiple times
over the buffer, and the status will tell the caller how many items
were printed.

Some examples:

#define BUFSIZE 80
char string[] = "Hello world";
float floats[6] = { 1, 2, .03, 400, 5e5, .00000006 };
struct fancy { int i,
char string[BUFSIZE] };
struct fancy mystruct;
MPI_Datatype fancy_type;
MPI_Status status;

/* Initialize fancy_type to have the type signature
* { MPI_INT, MPI_CHAR, ... , MPI_CHAR }
*/

MPI_Comm_rank( MPI_COMM_WORLD, &mystruct.i );
gethostname( mystruct.string, BUFSIZE );

/* Assume running on 4 nodes */
/* Nodes independently print first element in arbitrary order */
MPI_Print( MPI_COMM_SELF, floats, MPI_FLOAT, 1, &status );
/* Output:
0: 1
3: 1
2: 1
1: 1
*/

/* Nodes print the same array element collectively */
MPI_Print( MPI_COMM_WORLD, floats, MPI_FLOAT, 1, &status );
/* Output:
0..3: 1
*/

/* Make one array element different on node zero */
if( mystruct.i == 0 ) {
floats[1] = 0;
}

/* Nodes print the entire array collectively; they're the same
* on all nodes except for node 0.
*/
MPI_Print( MPI_COMM_WORLD, floats, MPI_FLOAT, 6, &status );
/* Output:
0: 1 0 0.03 400 500000 6e-8
1..3: 1 2 0.03 400 500000 6e-8
*/

/* Printing a C-string works as you'd hope; characters after a null
* are ignored. How should this work in Fortran?
*/
MPI_Print( MPI_COMM_WORLD, string, MPI_CHAR, sizeof(string), &status );
/* Output:
0..3: Hello world
*/

/* Print a complex type that's different on every node. Lines in a
* collective print always appear in rank order.
*/
MPI_Print( MPI_COMM_WORLD, &mystruct, fancy_type, 1, &status );
/* Output:
0: 0 myhost
1: 1 myhost
2: 2 myhost
3: 3 myhost
*/

Notes

We can argue about the exact output format; here I've chosen a
pretty common format for naming nodes and groups of nodes, followed by
a colon, a space, and then each element described by the datatype
separated by a space. The floats come out in the form you'd get with
%g in printf, and a newline is automatically added at the end of each
line. I'm certainly open to suggestion on this format, but I would
argue against making it too complex or highly configurable, since I
expect the main use of this function will be for quick debugging.

Merging multiple lines is likely to be expensive, but I think it's
a reasonable cost in this context because the amount of data to
merge will be relatively small, and print statements are not usually
time critical operations anyway. Merging the data saves the user
from the error prone task of comparing multiple output lines by eye.

One could also argue for letting the user send the output to stderr
or some other location; again, I like keeping the number of parameters
small given the indented use of the function.