re: MPI1 clarification -- error handling

Al Geist (geist@msr.EPM.ORNL.GOV)
Sat, 22 Jul 1995 20:07:08 -0400

>If we want MPI_ERRORS_ARE_FATAL to abort the entire
>computation, then we should say that it behaves as if
>MPI_ABORT(MPI_COMM_WORLD, somecodenumber) was called. I would suggest the
>second interpretation.

Been there with PVM, and we found aborting the "world" to be a bad thing.
I suggest "at most" just processes in COMM are aborted.
For MPI-1 picking either interpretation is OK
but looking ahead to MPI-2 with client servers
I wouldn't want a failed client to cause the server to be killed.
(this is what we ran into and had to fix in PVM)

Looking even further ahead (MPI-3?) if we incorporate some fault tolerance
then we don't want choices made in MPI-1 to hinder this work.

Al Geist