1) Associate a 'tag' argument with each spawn call, which would be an IN arg.
Multiple spawns that were to be associated as part of the same MPI_COMM_WORLD
would be given identical tags by the user and MPI would then know enough to do
the Right Thing.
2) Defer the actual spawning of the processes until the attach/detach call,
much as Heidi suggested. This would be very strange, though, because we would
be getting back groups from MPI which did not correspond to any actual
processes. Soft failures now become very ugly, so then maybe we would want to
start talking about replacing the groups with some sort of opaque placeholder,
and things get very ugly very quickly. If we have to start adding a dozen new
support functions just to get back to where we were then the whole reason for
creating the new proposal in the first place just vanishes.
I shall now attempt to argue that perhaps the limitations in the
MULTIPLE_INDEPENDENT case are acceptable:
First, it is not clear to me that the case of the 'threads with private memory'
is a real problem. One restriction of such a model is that you usually
(always?) have to be running the same executable image. In the case where the
children are supposed to communicate with the parent, how is MPI to determine
that the children may correctly be added to the group of threads? I'm inclined
to conclude that in general it can't. In the case of a single group of
independent processes, there is no issue because a completely new group can be
created. And in the case of multiple independent groups, one would imagine that
there was a good reason for making multiple spawn calls in the first place
which would also disallow this type of process initialization.
I think that it would be completely acceptable (and within the spirit of MPI)
to say that for best performance, processes should be spawned together.
Sometimes this will not be possible, and of course we should do what we can to
make these cases work well, but in this *particular* case I am unconvinced.
Ditto for machines such as the T3D/E. Unless I am wrong (and I very well could
be), these "partition" machines are often limited to running only a single
executable per partition. So, in the case of a SPAWN_MULTIPLE, you are always
going be creating seperate partitions anyway, regardless of whether they will
eventually be communicating with each other.
Now, in the (much more interesting and troubling) case of the IBM SP, it looks
like we really do have a case in which providing the complete information at
spawn time would make a significant difference. Dick gives a good example of a
failure which in the new proposal would not be detectable until the detach
call. But (playing Devil's Advocate for a minute) it could perhaps be argued
that a failure in the detach call is *exactly* what should happen! After all,
the spawns do succeed. The problem arises when attempting to build the united
MPI_COMM_WORLD for the children, which should perhaps be viewed as a
fundamentally different sort of error.
I can think of several ways of recovering from this error, for example you
might just try to merge a subset of the created processes together and keep
shrinking the set until you succeed. Another possibility might be to just go
with the "user hostile" approach of specifying a maximum partition size at job
init time, and requiring the user to stay under the cutoff point when spawning.
I'm not saying that I agree with every one of the above arguments, I just
wanted to suggest possible ways of thinking about the issuess...
-- Eric Salo Silicon Graphics Inc. "Do you know what the (415)933-2998 2011 N. Shoreline Blvd, 7L-802 last Xon said, just salo@sgi.com Mountain View, CA 94043-1389 before he died?"