[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: MPI_FINALIZE
Jeff Squyres wrote:
> A few points:
>
> 1. Implementing the change in MPI_FINALIZE to make it collective over "the
> union of all processes that have been and continue to be connected" is a
> non-trivial distributed algorithm, since it is essentially a barrier over
> potentially unrelated and not-directly-connected processes.
>
> 2. Is there a difference between "have been and continue to be connected"
> and "are connected"?
>
> 3. This change can potentially drastically change the semantics of
> currently-valid MPI programs.
>
> As one example: currently-valid "task-farm" programs may unintentionally
> cause a lot of "zombied" MPI processes that are simply waiting for an
> MPI_FINALIZE from their ancestor(s). Consider what happens if a root
> process continually spawns short-lived MPI processes to perform some task
> in a "fire and forget" kind of model. The short-lived child processes
> could previously invoke MPI_FINALIZE and die. With the proposed change,
> the short-lived processed will now block waiting for the parent to invoke
> MPI_FINALIZE as well.
>
> This program can be fixed by having the root and child processes invoke
> MPI_COMM_DISCONNECT right after spawning (or after whenever the last
> message between the root and children finishes) so that the child can
> MPI_FINALIZE by itself, and then die.
As I have understood the MPI-2 standard (Page 106, Lines 11-41),
this is exactly the procedure which the standard requires.
Thus, programs which don't disconnect before calling MPI_Finalize
expect waiting for MPI_Finalize.
Programs, you mentioned and which expect dying before all connected
processes have called MPI_Finalize, are not ``valid'' MPI programs.
We have implemented MPI_Finalize for NEC SX systems corresponding
to Page 106; which corresponds to a barrier over all MPI processes
which are connected.
> But my concern is backwards compatibility: we have no idea how many
> programs exist that rely on MPI_FINALIZEing over just MPI_COMM_WORLD.
> Changing the spec now could cause unintended side-effects in
> currently-valid MPI programs.
I don't see this backwards compatibility problem; the programs
are not standard conform and may still run (including the ``zombie''
MPI processes which may waste resources).
Best regards
Hubert
--
______________________________________________________________________________
Hubert Ritzdorf
NEC Europe Ltd.
C&C Research Laboratories
Rathausallee 10
D-53757 Sankt Augustin
______________________________________________________________________________