[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: MPI_FINALIZE



Jeff Squyres wrote:

> A few points:
>
> 1. Implementing the change in MPI_FINALIZE to make it collective over "the
> union of all processes that have been and continue to be connected" is a
> non-trivial distributed algorithm, since it is essentially a barrier over
> potentially unrelated and not-directly-connected processes.
>
> 2. Is there a difference between "have been and continue to be connected"
> and "are connected"?
>
> 3. This change can potentially drastically change the semantics of
> currently-valid MPI programs.
>
> As one example: currently-valid "task-farm" programs may unintentionally
> cause a lot of "zombied" MPI processes that are simply waiting for an
> MPI_FINALIZE from their ancestor(s).  Consider what happens if a root
> process continually spawns short-lived MPI processes to perform some task
> in a "fire and forget" kind of model.  The short-lived child processes
> could previously invoke MPI_FINALIZE and die.  With the proposed change,
> the short-lived processed will now block waiting for the parent to invoke
> MPI_FINALIZE as well.
>
> This program can be fixed by having the root and child processes invoke
> MPI_COMM_DISCONNECT right after spawning (or after whenever the last
> message between the root and children finishes) so that the child can
> MPI_FINALIZE by itself, and then die.

  As I have understood the MPI-2 standard (Page 106, Lines 11-41),
  this is exactly the procedure which the standard requires.
  Thus, programs which don't disconnect before calling MPI_Finalize
  expect waiting for MPI_Finalize.
  Programs, you mentioned and which expect dying before all connected
  processes have called MPI_Finalize, are not ``valid'' MPI programs.

  We have implemented MPI_Finalize for NEC SX systems corresponding
  to Page 106; which corresponds to a barrier over all MPI processes
  which are connected.


> But my concern is backwards compatibility: we have no idea how many
> programs exist that rely on MPI_FINALIZEing over just MPI_COMM_WORLD.
> Changing the spec now could cause unintended side-effects in
> currently-valid MPI programs.

  I don't see this backwards compatibility problem; the programs
  are not standard conform and may still run (including the ``zombie''
  MPI processes which may waste resources).

  Best regards

  Hubert

--
______________________________________________________________________________

   Hubert Ritzdorf

   NEC Europe Ltd.
   C&C Research Laboratories
   Rathausallee 10
   D-53757 Sankt Augustin

______________________________________________________________________________