This is true. But MPI2 will specify the behavior when a process is
started under MPI_Spawn, and so implementations will be required to
distinguish between spawned and non-spawned. Distinguishing between
run-as-part-of-group and run-as-singleton is quite similar I think
(requires very little implementation effort if you're already handling
spawn).
> On Paragon, the only things required to create an MPI program as
> opposed to an "NX" program is to link with the MPI library and call
> MPI_Init().
>
> Yes, on the Paragon MPI_Init() is currently a collective operation
> which assumes we want to initiate some optimized global operations
> based on the the number and position of participating processes.
The question here is, collective over what? MPI_Init() cannot be
collective *in the MPI sense*, since there is no MPI group existing at
the time MPI_Init() is called. In the Paragon (from what I understand
Joel to say) MPI_Init() is collective in a resource-manager sense:
i.e. it's collective over all processes which were started "at the
same time" (by whatever process-startup method the Paragon uses).
> From this I infer that the only modification to the standard being
> suggested by Gary's and others discussion of creating singleton MPI
> processes without mpirun is to specify that MPI_Init must only be a
> local operation, cannot be a collective operation.
I certainly don't intend that MPI_Init() *always* make a singleton MPI
application! (i.e. always be local), but I do expect implementations,
ones that support Spawn anyway, to be able to tell (by cmd line args
or env vars or whatever) how the process was started. If it is
started without the necessary framework to know whether it has any
siblings or parents, it should just start up by itself in its own
group.
I hope that phrasing should even satisfy Joel, since on the Paragon a
process which (links with mpi and) calls MPI_Init() is *always*
started with info about its siblings and how to hook up with them, so
this issue wouldn't come up in that implementation.
Joel, will the Paragon support Spawn() in the future? If so, then you
might have to face this issue in that implementation anyway.
And a quick response to Steve Huss-Lederman's questions:
> Date: Tue, 30 Apr 1996 10:52:34 -0500
> From: Steve Huss-Lederman <lederman@cs.wisc.edu>
>
> I have no problem with the requested functionality to start MPI on
> limited processes. However, I am not sure how this would work.
> Aside from the comments by others, I don't understand how MPI will
> know which processes to run on without changing MPI_Init. Here is
> what is bothering me:
>
> Right now MPI_Init involves all the processes. I presume at
> MPI_Init (or through some previous majic) decides on ranks,
> etc. for the processes participating. If arbitrary numbers of
> processes can call MPI_Init, how does it know the number involved?
> For example, if you started your job on 10 nodes, how can the MPI
> implementation tell if it should wait until all 10 nodes call
> MPI_Init or initialize MPI after one node calls MPI_Init.
How does it know now (in MPI1) how many processes are involved? Or to
ask it another way, what do you mean above by "all the processes"?
Whatever your answer to that, all I'm asking for is if the process is
started up *without* that info, it should start up by itself. On some
machines it'll always have that info (it's built in to the partition
size, or whatever). On clusters and some MPPs, that info is commonly
passed from mpirun or MPI_Spawn() via cmd line args and/or environment
variables, and will be missing if the process is just started up naked
from the command line, or fork/execed by another process.
> If one calls MPI_Init and then another calls MPI_Init an hour later
> does it join the first node (i.e., MPI_COMM_WORLD becomes size 2)
> or does it start its own MPI world? I don't see how to tell these
> apart without adding info to MPI_Init.
I disagree. I don't want to change the current behavior, which is
well-defined in this case. If two processes are started via mpirun
and one waits a long time before calling MPI_Init() they should be in
the same group, as they are now. If two processes are started
*disjointly* (not via mpirun or any other common scheduler/resource
manager, not on a machine which only permits a single MPI job per
partition, etc.) then they should become two separate MPI groups, with
no connection between them (unless/until they use MPI_accept and
MPI_connect). Does that make sense?
-- Gary Oberbrunner garyo@avs.com Advanced Visual Systems, Inc. http://www.avs.com/~garyo 300 Fifth Avenue (617)890-8192 x2133 TEL Waltham, MA 02154 (617)890-8287 FAX