Re: dynamic process chapter - part I

William C. Saphir (wcs@nas.nasa.gov)
Fri, 5 Apr 1996 13:03:03 -0800

On Apr 5, 8:02am, Parkson Wong wrote:
> 2) MPI_SPAWN_MULTIPLE as defined cannot be a natural extend of MPI_Spawn.
> Given min_procs and maxprocs, the spawned "process" has no idea of
> how many nodes the program is running on, and what rank it is inside
> the comm group.
This is an issue I've thought about, and I've come to the
conclusion that MPI-2 is no worse off than MPI-1 in this area.
For MPI_SPAWN_MULTIPLE, the parents and children establish
communication, so the parent can inform the children who they
are. The parent knows how many children of each type there
are through the array_of_errorcodes.
For instance, if the parent tries to spawn 5 of appA and 8
of appB, it will get an array_of_errorcodes of length 8.
If only 3 of appA and 4 of appB are successfully spawned,
then 3 of elements [0...4] of the array will be MPI_SUCCESS,
and 4 of the elements [5...12] will be MPI_SUCCESS (this is
spelled out in the current draft).

The case of MPI_SPAWN_MULTIPLE_INDEPENDENT will be worse,
since the parent can't directly tell the children.
However, the situation is no worse than MPI-1. For instance,
if you start up an MPI-1 program with 3 of appA and 4 of appB,
the application knows only that there are 7 processes. It
doesn't automatically know how many of each type there
are. At NAS, we have this MPIRUN library in which the
application calls MPIRUN_INIT to find out how many
processes of each type there are. MPIRUN_INIT must coordinate
with the process launching mechanism (in our case, the NAS
version of "mpirun"). This is all handled externally,
where mpirun creates a file and MPIRUN_INIT reads it.
There is no reason why this same approach can't be
taken in MPI-2. A library MPIRUN_SPAWN could be a
wrapper around MPI_SPAWN_INDEPENDENT that gets the information to
the children. In fact, the easiest way to implement
it would be to call MPI_SPAWN under the covers, send
information to the children through the intercommunicator,
and free the intercommunicator (though you'd really like
MPI_PARENT to subsequently return MPI_COMM_NULL...).

> 4) If we go back to the old definition, without the min and max, there
> is still no way for the program spawned to know what is the size of
> the program. It only know the size of all the programs together since
> there is only one MPI_COMM_WORLD. Some outside help will be needed.
> One way is to pass these infomation back in argv when the program calls
> MPI_init. For whatever machanism choose, that need to be defined.
Agreed.
I think the problem here really exists for MPI-1 applications
as well. There is no standard way for the application
to find out about logical groups of processes (where groups
may correspond to different binaries, different disciplines,
different grids, etc) that are determined outside of the
application. The method described above will work, but
it can get a bit clumsy.

Bill