Re: dynamic counter-proposal

Dick Treumann (treumann@kgn.ibm.com)
Tue, 14 May 1996 16:58:53 -0400

The IBM SP implementation of dynamic process spawn would suffer under
the new proposal. I am not happy with the large number of functions in
the current proposal but our implementation needs the information these
functions provide if it is to make the decisions required by each flavor
of spawn.

Our current version of MPI uses what we call a "partition table. This
table is created at the time a parallel job is started and remains
stable for the life of the job. Communication over the switch depends
upon information in this table. Every node in the job must have the
same table.

For MPI-2 we will be changing this to a dynamic table which can be
expanded as tasks are added to a communication universe. It will still
be essential for each task in a communication universe to have its copy
of the table syncronized with copies at other tasks. There will be an
upper bound on the size of a table.

If there is a call to the new MPI_Spawn with the MPI_MPI flag, there are
three possibilities.

1) The new process group will be part of the current universe

2) new group will be independant, both from current universe and from
the universe created by any other MPI_Spawn.

3) new group will be independant of current universe but will join other
groups created in seperate MPI_Spawn calls to yield a single universe.

In case 1, I would expand the current universe and if the number of new
tasks requested exceed the remaining capacity, I would spawn what I
could and return that number. Distinguishing case 1 from 2/3 would
require the INDEP/NOINDEP flag.

Cases 2 and 3 are not distinguished in the new proposal. For case 2 I
would like to create a new partition table for each MPI_Spawn. For case
3 I would like to put all related calls into a single new table. I
cannot do that with info available in the new MPI_Spawn. So, instead I
create a new hypothetical table for each MPI_Spawn and plan to merge
them into one real table at the MPI_Child_detach. What do I do at the
MPI_Child_detach if combined size of the group exceeds the capacity of a
single universe. My chance to deliver a smaller number of tasks was
gone once I returned from MPI_Spawn. Each MPI_Spawn by itself asked for
a number of tasks which would fit a partition so it got them.

This is a specific example of the kind of thing Bill G and Rusty L have
been describing in more general terms.

Dick Treumann

-- 
Dick Treumann                               POWER Parallel Systems
(Internet) treumann@kgn.ibm.com             IBM  -- Poughkeepsie, NY
(VNET)     TREUMANN at KGNVMC               Tel: (914) 433-7846
(internal) treumann@windsurf.kgn.ibm.com    Fax: (914) 433-8363