7.2. Error handling

Up: MPI Environmental Management Next: Error codes and classes Previous: Clock synchronization

An MPI implementation cannot or may choose not to handle some errors that occur during MPI calls. These can include errors that generate exceptions or traps, such as floating point errors or access violations. The set of errors that are handled by MPI is implementation-dependent. Each such error generates an MPI exception.

The above text takes precedence over any text on error handling within this document. Specifically, text that states that errors will be handled should be read as may be handled.

A user can associate an error handler with a communicator. The specified error handling routine will be used for any MPI exception that occurs during a call to MPI for a communication with this communicator. MPI calls that are not related to any communicator are considered to be attached to the communicator MPI_COMM_WORLD. The attachment of error handlers to communicators is purely local: different processes may attach different error handlers to the same communicator.

A newly created communicator inherits the error handler that is associated with the ``parent'' communicator. In particular, the user can specify a ``global'' error handler for all communicators by associating this handler with the communicator MPI_COMM_WORLD immediately after initialization.

Several predefined error handlers are available in MPI:

The handler, when called, causes the program to abort on all executing processes. This has the same effect as if MPI_ABORT was called by the process that invoked the handler.
The handler has no effect other than returning the error code to the user.

Implementations may provide additional predefined error handlers and programmers can code their own error handlers.

The error handler MPI_ERRORS_ARE_FATAL is associated by default with MPI_COMM_WORLDafter initialization. Thus, if the user chooses not to control error handling, every error that MPI handles is treated as fatal. Since (almost) all MPI calls return an error code, a user may choose to handle errors in its main code, by testing the return code of MPI calls and executing a suitable recovery code when the call was not successful. In this case, the error handler MPI_ERRORS_RETURN will be used. Usually it is more convenient and more efficient not to test for errors after each MPI call, and have such error handled by a non trivial MPI error handler.

After an error is detected, the state of MPI is undefined. That is, using a user-defined error handler, or MPI_ERRORS_RETURN, does not necessarily allow the user to continue to use MPI after an error is detected. The purpose of these error handlers is to allow a user to issue user-defined error messages and to take actions unrelated to MPI (such as flushing I/O buffers) before a program exits. An MPI implementation is free to allow MPI to continue after an error but is not required to do so.

[] Advice to implementors.

A good quality implementation will, to the greatest possible, extent, circumscribe the impact of an error, so that normal processing can continue after an error handler was invoked. The implementation documentation will provide information on the possible effect of each class of errors. ( End of advice to implementors.)
An MPI error handler is an opaque object, which is accessed by a handle. MPI calls are provided to create new error handlers, to associate error handlers with communicators, and to test which error handler is associated with a communicator.

MPI_ERRHANDLER_CREATE( function, errhandler )
[ IN function] user defined error handling procedure
[ OUT errhandler] MPI error handler (handle)

int MPI_Errhandler_create(MPI_Handler_function *function, MPI_Errhandler *errhandler)

Register the user routine function for use as an MPI exception handler. Returns in errhandler a handle to the registered exception handler.

In the C language, the user routine should be a C function of type MPI_Handler_function, which is defined as:

typedef void (MPI_Handler_function)(MPI_Comm *, int *, ...); 
The first argument is the communicator in use. The second is the error code to be returned by the MPI routine that raised the error. If the routine would have returned MPI_ERR_IN_STATUS, it is the error code returned in the status for the request that caused the error handler to be invoked. The remaining arguments are ``stdargs'' arguments whose number and meaning is implementation-dependent. An implementation should clearly document these arguments. Addresses are used so that the handler may be written in Fortran.

[] Rationale.

The variable argument list is provided because it provides an ANSI-standard hook for providing additional information to the error handler; without this hook, ANSI C prohibits additional arguments. ( End of rationale.)
MPI_ERRHANDLER_SET( comm, errhandler )
[ IN comm] communicator to set the error handler for (handle)
[ IN errhandler] new MPI error handler for communicator (handle)

int MPI_Errhandler_set(MPI_Comm comm, MPI_Errhandler errhandler)


Associates the new error handler errorhandler with communicator comm at the calling process. Note that an error handler is always associated with the communicator.

MPI_ERRHANDLER_GET( comm, errhandler )
[ IN comm] communicator to get the error handler from (handle)
[ OUT errhandler] MPI error handler currently associated with communicator (handle)

int MPI_Errhandler_get(MPI_Comm comm, MPI_Errhandler *errhandler)


Returns in errhandler (a handle to) the error handler that is currently associated with communicator comm.

Example: A library function may register at its entry point the current error handler for a communicator, set its own private error handler for this communicator, and restore before exiting the previous error handler.

[ IN errhandler] MPI error handler (handle)

int MPI_Errhandler_free(MPI_Errhandler *errhandler)

Marks the error handler associated with errhandler for deallocation and sets errhandler to MPI_ERRHANDLER_NULL. The error handler will be deallocated after all communicators associated with it have been deallocated.

MPI_ERROR_STRING( errorcode, string, resultlen )
[ IN errorcode] Error code returned by an MPI routine
[ OUT string] Text that corresponds to the errorcode
[ OUT resultlen] Length (in printable characters) of the result returned in string

int MPI_Error_string(int errorcode, char *string, int *resultlen)


Returns the error string associated with an error code or class. The argument string must represent storage that is at least MPI_MAX_ERROR_STRING characters long.

The number of characters actually written is returned in the output argument, resultlen.

[] Rationale.

The form of this function was chosen to make the Fortran and C bindings similar. A version that returns a pointer to a string has two difficulties. First, the return string must be statically allocated and different for each error message (allowing the pointers returned by successive calls to MPI_ERROR_STRING to point to the correct message). Second, in Fortran, a function declared as returning CHARACTER*(*) can not be referenced in, for example, a PRINT statement. ( End of rationale.)

Up: MPI Environmental Management Next: Error codes and classes Previous: Clock synchronization

Return to MPI 1.1 Standard Index
Return to MPI-2 Standard Index
Return to MPI Forum Home Page

MPI-1.1 of June 12, 1995
HTML Generated on August 6, 1997