Re: dynamic counter-proposal

William Gropp (gropp@mcs.anl.gov)
Fri, 10 May 1996 14:37:37 -0500

I am concerned about the implementation complexity of the spawn/attach
approach in the case of an MPP with an interconnect that must be reconfigured
in order to provide high-performance connections. I am ignoring the
user-hostile approach of only permitting "spawn" into a preallocated
partition. This interconnect might be a physical device or the way an OS
keeps track of seperate executable objectss (I'm avoiding threads/processes
deliberately).

In this case, the interconnect will need to be configured twice: once when the
new jobs are spawned, and once when the final collection of members are known
(unless the user is lucky and the implementations expected case matched
theirs). Now, this isn't a scalability problem since it can at worst
(roughly) double the time to spawn. But it does add to complexity to the MPP
case, which is the case that MPI should be targeting (there ARE better ways
to do this stuff for workstation clusters if that's all you're interested in;
just to start with, there are the fault-tolerance issues). It may add enough
complexity that only the user-hostile approach is implemented. This would be
a major lose for the user community that MPI should be serving.

I'll make three other points:

1) When considering implementation issues the concern must not be just "can we
do it with current products" but "do we think that reasonable implementations
in the future of MPI will be able to do this?" This is of course much harder,
but it is important in developing a standard. I am concerned that
some high-performance implementations may find this approach awkward.

2) There are other issues that are likely to need to know if the spawn is
creating processes that are an extension of an existing set of a new set.
For example, the user may want to gang or co-schedule the new processes with
the existing set.

3) An interesting model that is popping up on some shared memory machines is
"threads with private memory". To the MPI user, this looks like an MPI
process (since it has its own memory space); to the implementation, it offers
major advantages in performance (both with shared memory and intelligent
scheduling). If you are doing a spawn of the same executable for the purposes
of expanding your communicator, you can create a new such "thread"; if you are
spawning to create a new independent process, you want to fork/exec. It is
too late to do this when the the attach/detach occurs. Technically, it is
possible to change the operating system to allow threads to be stripped off
into another process, but I can't see why it should be done just to support
MPI. Also note that in both cases the user is likely to use the NULL info
argument, making it impossible to know what to do until it is too late.

Bill