[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

MPI_FINALIZE



A few points:

1. Implementing the change in MPI_FINALIZE to make it collective over "the
union of all processes that have been and continue to be connected" is a
non-trivial distributed algorithm, since it is essentially a barrier over
potentially unrelated and not-directly-connected processes.

2. Is there a difference between "have been and continue to be connected"
and "are connected"?

3. This change can potentially drastically change the semantics of
currently-valid MPI programs.

As one example: currently-valid "task-farm" programs may unintentionally
cause a lot of "zombied" MPI processes that are simply waiting for an
MPI_FINALIZE from their ancestor(s).  Consider what happens if a root
process continually spawns short-lived MPI processes to perform some task
in a "fire and forget" kind of model.  The short-lived child processes
could previously invoke MPI_FINALIZE and die.  With the proposed change,
the short-lived processed will now block waiting for the parent to invoke
MPI_FINALIZE as well.

This program can be fixed by having the root and child processes invoke
MPI_COMM_DISCONNECT right after spawning (or after whenever the last
message between the root and children finishes) so that the child can
MPI_FINALIZE by itself, and then die.

But my concern is backwards compatibility: we have no idea how many
programs exist that rely on MPI_FINALIZEing over just MPI_COMM_WORLD.
Changing the spec now could cause unintended side-effects in
currently-valid MPI programs.

{+} Jeff Squyres
{+} jsquyres@lam-mpi.org
{+} http://www.lam-mpi.org/