Threads and I/O calls

John M May (johnmay@coral.llnl.gov)
Fri, 6 Dec 1996 11:32:52 -0800 (PST)

We are trying to sort out what I/O operations should be allowed to
proceed concurrently in a multithreaded program. Here are what seem
to be the relevant passages from the standards:

MPI Complete Reference, p 198 (Sec 4.13):
"...in multithreaded implementations, one can have more than one,
concurrently executing, collective communication calls at a process.
In these situations, it is the user's responsibility to ensure that
the same communicator is not used concurrently by two different
collective communication calls at the same process."

MPI-2 draft, p 158 (Sec 7.9.1):

"It is the user's responsibility to prevent races when threads
within the same application post conflicting communication calls.
The user can make sure that two threads in the same process will not
issue conflicting communication calls by using distinct communiators
at each thread....

"All MPI calls are thread-safe. I.e., two concurrently running
threads may make MPI calls and the outcome will be as if the calls
executed in some order, even if their execution is interleaved."

MPI-2 draft, p 210 (Sec 10.2.3):

"The user is responsible for ensuring that all outstanding requests
associated with fh [the file handle being closed] have completed
before calling MPI_CLOSE."

OK, first, I think it's reasonable to extend the rules for
communicators to file handles as well, and to include I/O
operations with communication calls mentioned in these passages.
Specifically, I assume that collective I/O on the same file
handle from different threads is not allowed. What about
concurrent nonblocking collective I/O calls issued from the same
thread? It would be nice to make all this explicit.

Now, the here's the real question we're worried about: What happens
if an MPI_Read call and an MPI_Close call are issued concurrently
for the same file handle on different threads. This is obviously
a bad idea, but what are the consequences? I can think of two
answers, based on different interpretations of the above rules.

1) Since the calls have to behave as if they "executed in some
order," then either the read completes correctly and then the
file is closed, or else the file is closed and the read returns
a nonfatal MPI_ERR_FILE.

2) Since the user has failed to prevent races, the calls may be
interleaved, resulting in a hard-to-reproduce, but perfectly legal
core dump when the read function tries to use a file table entry
that the close function has just deallocated.

Obviously, it would be nice to avoid case 2 where possible, but
that requires some fairly careful (and possibly costly) manipulation
of locks.

Also, is it correct to interpret the passage from pate 210 as
supporting the view that case 2 is legal, or do the requests
mentioned specifically mean outstanding nonblocking calls, as
opposed to blocking requests on other threads?

John