I/O error handling proposal

John M May (johnmay@coral.llnl.gov)
Fri, 31 Jan 1997 13:01:37 -0800

Here is an alternative proposal for handling I/O errors that I discussed
with a few people at the meeting last week. Despite the length of this
message, the proposal is pretty simple:

MPI-2 will have a new built-in communicator. For purposes of
discussion, I'll call it MPI_FILE_WORLD, even though this is
arguably the wrong name (see below). MPI_FILE_WORLD is defined
to be a dup of MPI_COMM_WORLD with the error handler set
to MPI_ERRORS_RETURN.

Users opening files may call MPI_Open using this communicator if
they want to get the less severe error handling that
MPI_ERRORS_RETURN specifies. They may also derive other communicators
from it in the usual ways if they want to open files with less than
the full set of nodes. (We may wish to add MPI_FILE_SELF for
convenience.)

Whenever an I/O function that takes a file handle as an input
parameter raises an error, it will pass this file handle as
a third parameter to the error handler function.

This proposal has the following advantages:

1) It gives users a way to specify errors-return semantics when
opening a file without requiring them to define a new
communicator.

2) It requires no changes to existing MPI objects or paradigms.

It has the following disadvantage:

1) Users who create new communicators for specific groups and
who want to use these same groups for I/O will have to
create separate communicators if they want different
error handlers for I/O and communication.

Jim raised some questions in his last message which
must also be addressed for this proposal:

1) When is the I/O error handler referenced ?
Is it saved at the time the file is opened, or does changing it on
the communicator after the file was opened affect errors raised by
a file which was opened with that communicator ?

The handler is saved when the file is opened. Since the user is
allowed to free the communicator that was used to open the file
as soon as the open call returns, it makes sense that other changes
to the communicator won't change the file handle's behavior. As
was discussed at the meeting, some implemenations will dup the
communicator when the file is opened; others will instead cache
all the important information (such as the group and the error
handler) without creating a dup.

2) What happens if the communicator on which a file was opened is
COMM_Free'ed before the file is closed ?
Do you say anything about this for other reasons ?

See above. I believe the intent has been to allow the user to
free the comm, although I'm not sure if the chapter says this
explicitly.

3) What are the arguments passed to the I/O error handler ?
In particular is it passed
a) the communicator on which the file was opened,
b) a dup of that communicator,
c) COMM_WORLD,
d) COMM_SELF,
e) no communicator ?

Since these are I/O errors it may make sense to pass is the
FILEHANDLE and no communicator. (This is OK, it doesn't have to
have the same binding as a normal error handler).

I agree that e) would be a clean solution if we go with Jim's
proposal. If we go with mine, then something has to be passed
in for the comm argument. My preference would be to pass in
MPI_COMM_NULL, since the dup'd communicator (if it exists) may not
mean anything to the user. For either proposal, MPI_Open should
pass the communicator that the user passed in.

4) As John is pointing out there may (it depends on how you answer
question 1) be now way to alter the error handler associated with a
file after open time.

Yes, that's what I meant. So perhaps we do need the get/set
functions still, with either my proposal or Jim's.

Now about naming: MPI_FILE_WORLD is not a good name because it
implies that the communicator can only be used for opening files,
when in fact it's an ordinary communicator that can be used for
any purpose. The name should really reflect the fact that this
communicator just has the less severe error handler associated
with it (Linda Stanberry suggested MPI_KINDER_GENTLER_WORLD).
So maybe it should be MPI_COMM_ERRORS_RETURN ? I don't know.

John