new dynamic proposal

Rusty Lusk (lusk@mcs.anl.gov)
Thu, 09 May 1996 04:06:14 -0500

First, a procedural issue. Our normal way of doing things is that at each
meeting we have discussions and straw votes which guide the chapter custodian
on changes to be made for the next meeting, as a way of converging to final
votes. At the last meeting a few small changes were recommended by straw
votes, which otherwise approved the functions we have been working on for a
year. I believe that we must have the opportunity to complete this process,
which appeared very close to completion. Those straw votes should be
considered as binding guidance; one can't just ignore them.

That doesn't mean, of course, that radical new counter-proposals cannot be
considered. So there is nothing wrong with bringing such a proposal to the
next meeting, but it has to be considered in the context of the existing
draft as it has been evolving, and has been voted on, so far.

All that said, I think the new proposal is a step backward. The key issue
is the separation of the spawn from the creation of the communicator.

1. Combining spawning with communicator formation in a collective operation
was done at an early stage in order to obtain scalability and
implementability, especially on MPP's, where there might need to be switch
allocation, etc.

2. Scalability concerns say that one should not unecessarily revisit spawned
processes a second time, just to set up communication, when that could have
been done at spawn time. Think thousands of processes.

3. It seems especially strange to have to have independent processes wait,
just to be detatched, instead of proceeding, getting scheduled by the
scheduler, and running, while they could be doing useful work, or getting
themselves submitted to the job scheduler, or whatever. This is a bad,
non-scalable, and very unecessary synchronization.

4. We have tried in the past not to constrain implementations. This proposal
is very constraining to implementation, and prevents them from being scalable
and robust.

5. Starting one process and having it start others is not the "MPI way" to
start a set of processes. Surely what is expected is that one will use spawn
to expand an already existing communicator. Then it is critical that this
operation be collective. In the new proposal it is not.

More later...

Rusty