> - In 3.3.2, MPI_SPAWN, description of "command-line" argument
> I propose to append (page 8, after line 41):
>
> If the "command-line" argument is omitted (NULL in C or an empty
> string in Fortran) further copies of the calling program are
> started.
>
> Reason:
> An application that uses MPI_UNIVERSE_SIZE can be written
> very portable. It must not analyse the calling sequence
> to find its own name.
So this sounds at first like a reasonable thing, but it looks
as if the intent is to facilitate a bad thing, which is
a host-node bootstrap approach to SPMD programs. i.e.,
you want 8 copies of "foo" in your application so you
spawn a copy of foo which spawns the other 7. It is
really much more appropriate in MPI to spawn all 8 at once,
and the 1+7 approach seems like an PVM (PARMACS?) relic.
This might be reasonable for
true host-node programs, because the master process may
not know what to spawn until it has started, and because
degraded master-slave communication is not likely to be
too much of a problem, but I don't see a justification
for the SPMD case. Is this what you're targeting?
> - In 3.3.2, MPI_SPAWN, paragraph about MPI_SPAWN_SOFT,
> I propose to append (page 9, after line 16):
> Advice to users. The number of spawned processes can be
> inquired with MPI_COMM_REMOTE_SIZE(intercomm, size).
> (End of advice to users.)
> Reason:
> It clarifies how to test "an empty intercommunicator is returned"
> (page 9, line 16).
This looks like a good clarification. We still have a problem
for MPI_Spawn_multiple() - suggestions welcomed.
>- In 3.3.4, proposal 2, MPI_UNIVERSE_SIZE(size)
> I propose to append (page 13, after line 3):
>
> Advice to users. Because it is not guaranteed that the returned
> number of processes can be started in a subsequent MPI_SPAWN,
> it is recommended to use MPI_SPAWN_SOFT in the "flag" argument
> there. (End of advice to users.)
If MPI_Universe_size is to be at all useful, I think it has
to be very reliable in the usual case, which is probably something
like:
mpirun -ntotal 10 -np 1 master
So I would hesitate to add this.
However, this brings up an interesting point, which is
that MPI_SPAWN_SOFT may return fewer processes than
requested either because of resource limitations,
which are in some sense expected, or because of "hard"
errors, such as a missing .rhosts file or executable.
It might be good to be able to distinguish
in whatever we come up with for MPI_Spawn error reporting.
> - In 3.3.4, further proposal for MPI_UNIVERSE_SIZE(size):
>
> MPI provides the following function:
>
> MPI_SPAWNABLE (where, n)
>
> IN where A string telling the runtime system where and/or
> how to start the processes as described in
> MPI_SPAWN
> OUT n Number of processes that can be usefully spawned.
>
> This function returns the number of processes that can be
> usefully started with a subsequent MPI_SPAWN or MPI_SPAWN_... .
> In MPI implementations that are tightly integrated
> . ... (same text as in proposal 2, page 9, lines 1-3)
It seems to me that this involves too much interaction with
the runtime system. In the current proposal, this would
be accomplished by direct interaction with the runtime system,
for instance, with pvm_config() (with a PVM runtime system).
There's a portability argument for putting this into
MPI. Any others?
A brief history of the MPI_UNIVERSE_SIZE thing is the following.
We originally had lots of interaction with the runtime system.
Reacting to the horror of this pandora's box, we eliminated
all interaction with the runtime system, saying that an application
should query the runtime system directly. However, what about
as above:
mpirun -ntotal 10 -np 1 master
Followed perhaps by
mpirun -ntotal 8 -np 1 master
mpirun -ntotal 12 -np 1 master
The assumption is that this is probably the most common
type of "dynamic" application.
The problem is how do you communicate the "10", "8" and "12" to
the program. MPI_UNIVERSE_SIZE was the solution. The reasoning
was that there isn't really any interaction with the
runtime system - it is just a message from the user to him/herself.
It was recognized that this was a slippery slope and we
have seen that happening. First with the proposal
for MPI_Universe_size() which seductively promised
to make things much easier for a large class of scavenger
applications, but despite the apparently simple change
added a key element - interaction with the runtime
system after MPI_Init(). The proposals above
are just a bit further down the slope. So I'm
advocating that we hold the line at a static universe size,
and worry, based on the straw poll at the last meeting,
that the whole thing will be voted out if it goes any further.
> - In 3.4.1 Registration and Connection (page 15-17)
> The proposal should separate clearly the functionality of
> [text deleted]
I agree this needs to be clarified in the way you suggest.
> My criticism:
> The word REGISTER in the function name "MPI_REGISTER_NAME" and
> its argument "name" are normally used in name service APIs.
>
> Comparison with DCE RPC string binding:
> The necessary functionality "inquire the port address" is
> realized with two calls:
> [code deleted]
> MPI-2's "given-name" is the same as the vector of
> string_bindings of the DCE RPC with the restriction of
> vector_length == 1.
>
> DCE uses a vactor of bindings because a computer can have
> more than one network interface and each binding represents
> a port of the application on each of these network interfaces.
>
> In high performance computing one case of using MPI_CONNECT
> can be that the user wants to use a high speed network
> instead of the "default Ethernet".
> [more text deleted]
This is an interesting point.
I think there is a way around it, since the port used by
accept() is not the same port as will eventually be used
for communication. It is only used to bootstrap the
communication. Thus it's ok if accept() uses a slow
interface. Or are there cases where there is no single
(slow) interface that can be reached from anywhere, so
that multiple "ports" are required?
What do you think?
> I did not found in the index of MPI-1 and MPI-2 the routines
> MPI_Group_init and MPI_Group_start_independent.
These are indeed from a previous draft. The examples will
be updated. group_init/group_merge/group_start_independent
is equivalent to spawn_multiple_independent.
Bill