According to the MPI-2 schedule at the beginning of
the meeting, this was an attempt at a "first reading,"
en route to fairly solid chapter and a first formal vote
at the next meeting. We'll attempt to incorporate the
discussion of the meeting and subsequent comments
into a draft by sometime next week.
- "where" argument in MPI_Spawn
After a suggestion that this should be a (void *) pointer
to an arbitrary object, the consensus was that this should
remain a string. A typical form might be "file=filename"
if complex information needs to be specified.
It was also noted that this argument may
specify "how" as well as "where," so perhaps
its name should be changed. "runtimeinfo" was
suggested.
There is still some confusion about how
"where" might work. We agreed to put some
examples in the next draft.
- "command-line" argument parsing.
We need to decide how this should be parsed. The original
suggestion of sh or csh parsing is probably too complex. It
was suggested in the meeting and agreed that the right
thing to do was whatever the corresponding C library
"exec" function does. Unfortunately there does not
appear to be (as far as I can tell) a variant of
exec() that takes a "command-line" and splits up the
arguments. In a previous meeting, we specifically
rejected having an argv[]-based interface.
So the question is now open again. Comments?
Parsing can be quite straightforward. The only
issue I'm aware of is how to quote whitespace (which
separates arguments) if you want it inside an argument.
It was suggested that the command line could
be "mpirun -np 4 app" to spawn an MPI process.
What happens in this case (N copies of "mpirun" are
spawned, each of which may start a 4-process app, which
the spawning program would not be aware of)
should be explicitly explained.
Of course "foo > bar" will spawn "foo" with
arguments ">" and "bar".
- error handling for process spawning.
The problem is that spawned processes may fail for different
reasons, and we have no way to return different error
codes.
There were two suggestions.
1. add an array_of_status argument to the spawn call
(and array_of_array_of_status to spawn_multiple).
2. add an opaque out arg, analagous to "where" which
would explain errors in an implementation-dependent way.
Neither of these was particularly well received.
We will come up with something for the next draft.
- for spawn_multiple, how do you find out how many
of each app was spawned, since all processes
are in the same group?
(for instance, if you specify 5xA + 5xB and get
back an intercommunicator with a remote group size
of 8, that could be or 5+3, 4+4, 3+5).
No suggestions. We will come up with something for
the next draft.
- proposal to rename mpi_comm_parent() since the name looks
similar to MPI_COMM_WORLD and MPI_COMM_WORLD which are
"constants".
MPI_Parent()?
MPI_Get_parent()?
There was also a bit of discussion about whether it should be a function
or a constant. Sentiment was for function.
- Discussion about mpi_universe_size
Original question was should it be a constant or should it
be a function with possibly changing value?
This started a discussion of whether we should have it
at all, after which there was a straw vote:
should mpi_universe_size exist in some form?
yes 15
no 6
abstain 3
But we didn't resolve what it should be.
- there was a suggestion to make the simple call (spawn)
blocking, and the rest nonblocking, in order to reduce
the number of functions. This was tabled.
- It was proposed to replace MPI_Signal with MPI_Kill
There was some discussion on whether non kill signals
were portable or meaningful.
Straw vote on whether signal should be replaced:
yes 10
no 7
abstain 8
A clearly undecided vote. We will come up with two alternate proposals.
- Suggestion to remove "There is no guarantee on delivery of signals"
with general agreement that this makes signals meaningless. Replace with
some advice to users that signals are not queued and that killing
a single MPI process may bring down a whole application. Also that
some MPI implementations may "steal" signals, so that using them
will have unpredictable results.
- discussion about MPI_Notify.
There was general agreement that MPI_Notify should always
work - for any processes, not just children, and notification
should be guaranteed.
Suggestion that "event" be a set of flags.
There was also discussion of whether the requests
are persistent.
Can MPI_Notify work with attached MPI processes? What
does that mean for fault tolerance?
There needs to be a lot more clarification. Will
be put in the next draft.
- attaching to independent processes
The feeling here is that there should be some discussion
of existing mechanisms and how they might fit in.
We'll put 2 or 3 specific scenarios into the next draft.
- It was noted that some of the examples use old syntax.
- Issues from other discussions that need to be resolved
for the dynamic processes chapter:
1. we need to work out the nonblocking collective semantics - in
particular, whether a cancel can work. My personal feeling
is that a cancel is pretty ridiculous for a spawn, or a
connect, but actually makes a fair amount of sense for
MPI_Accept(). A server would always leave an Accept() lying
around for new connections. I would therefore propose
that you can cancel a nonblocking accept() and that
the cancel is a collective operation in the processes
that called accept() (noting, though, that accept() itself is
normally collective among accepting and connecting
processes). Technically, cancelling a spawn() or
a connect() would be much harder, and I see no
motivation for allowing it. [This is similar
to the point-to-point case, where it makes
sense to cancel a receive(), but cancelling
a send() is much harder (and not done correctly
in current implementations) and unnecessary.]
2. We need to come up with accessor functions for
mpi_status for the new request types. Will do
this for the next draft. Depends on how we handle
spawn errors and MPI_Notify().
======
I may have left out some issues or gotten the details
wrong. Please send corrections/comments to the list.
Bill