[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MPI_FINALIZE
A few points:
1. Implementing the change in MPI_FINALIZE to make it collective over "the
union of all processes that have been and continue to be connected" is a
non-trivial distributed algorithm, since it is essentially a barrier over
potentially unrelated and not-directly-connected processes.
2. Is there a difference between "have been and continue to be connected"
and "are connected"?
3. This change can potentially drastically change the semantics of
currently-valid MPI programs.
As one example: currently-valid "task-farm" programs may unintentionally
cause a lot of "zombied" MPI processes that are simply waiting for an
MPI_FINALIZE from their ancestor(s). Consider what happens if a root
process continually spawns short-lived MPI processes to perform some task
in a "fire and forget" kind of model. The short-lived child processes
could previously invoke MPI_FINALIZE and die. With the proposed change,
the short-lived processed will now block waiting for the parent to invoke
MPI_FINALIZE as well.
This program can be fixed by having the root and child processes invoke
MPI_COMM_DISCONNECT right after spawning (or after whenever the last
message between the root and children finishes) so that the child can
MPI_FINALIZE by itself, and then die.
But my concern is backwards compatibility: we have no idea how many
programs exist that rely on MPI_FINALIZEing over just MPI_COMM_WORLD.
Changing the spec now could cause unintended side-effects in
currently-valid MPI programs.
{+} Jeff Squyres
{+} jsquyres@lam-mpi.org
{+} http://www.lam-mpi.org/