We have an ambitious goal of taking a binding vote
or two on the dynamic chapter for the next meeting.
In order to facilitate this, we decided at the last
meeting to divide the chapter into three parts
(Omnia MPI2 chapter 2 in tres partes divisa est) -
part 1:
SPAWN
SPAWN_MULTIPLE
PARENT
UNIVERSE_SIZE
ISPAWN
ISPAWN_MULTIPLE
part 2:
SPAWN_INDEPENDENT
SPAWN_MULTIPLE_INDEPENDENT
SIGNAL
SIGNAL_GROUP
NOTIFY
ISPAWN_*_INDEPENDENT
part 3:
client server.
The reasoning was that there seems to be pretty good agreement on part
1, some disagreement on part 2, and part 3 is still settling down,
with a couple of fairly different directions that we might take. Over
the next few days, we expect (hope) to put out versions of all three
sections for discussion, based on the consensus (or sometimes lack
thereof) at the last meeting.
In the meantime, here is a list of some issues from last Friday. Some
of these we voted (straw) on, and this list is (hopefully) just a
record of that. Others, we did not vote on, or even have time to
discuss, so please contribute your opinion. This will shape what
actually gets put into the document for discussion. There is one item
(9) with some undiscussed material.
1. (void *) vs. (char *) for info arg in spawn
We voted for (char *) by a 14-5-7 vote. It is unclear
to me whether people mean by this that they want
the info arg restricted to a null-terminated character
string, or if casting a random structure to (char *)
should be allowed.
2. Quote mechanism for command the command line.
Looks like the vote was
20: we should have command-line + quote mechanism
1: we should have command-line + no quote mechanism
4: we should have some sort of argv mechanism instead
(I don't have a number of abstains written down and
this appears to have been a three-way beauty contest).
There was some discussion then about quote mechanisms.
The character-based quote mechanism (described in the
current document) seems basically fine except that people were
unhappy with the special hack that allows you
to get an empty string (stripping off a single
leading space). For the empty string, a string-based
(quotes on either side) mechanism seemed more
natural, but this complicated the situation.
As for what character to use to quote, there
was a vote of 12 for '%', 4 for '\' and 0 for '@',
with 9 abstentions. (Yes, I am aware of the
irony in the previous sentence).
Discussion?
3. Fortran equivalent for NULL that can currently
be specified in the C binding for array-or-errcodes.
(MPI_NULL or MPI_BOTTOM). The vote was
6: do something 8: do nothing 9: abstain.
This issue was deferred to the miscellaneous
group.
4. "Flags" argument for MPI_SPAWN.
In the current draft this is (supposed to be) an integer
whose bits represent boolean flags. We only have
one flag - MPI_SPAWN_SOFT.
There was a lot of discussion about this, covering things such as:
a. should this argument exist at all or should it be put into info
b. should we specify instead a minimum number of processes
The first vote was 25-3 in favor of doing "something" instead
of deleting the argument and putting its functionality into
info.
The second vote was:
some kind of "soft" flag: 7
mininum procs arg instead: 11
neither: 1
abstain: 8
I'd like to have some discussion here. I'm currently planning
to include two different proposals. One with a "minimum procs"
argument and one with a SOFT flag. Strong arguments why
we should not do one or the other are encouraged.
5. order of errorcodes
An issue people seemed to agree about. Currently the
document specifies that if you have a soft request
for N, M of which succeed, the first M error codes are
MPI_SUCCESS, and the rest indicate an error. The proposal,
accepted by 17-3-5, was to say that MPI does not specify
the order (within the procs associated with a single command line).
The implementation can specify the order, allowing
the user to make use of ordering information that
might have been contained inside the info argument.
6. Next we have a very long discussion about MPI_SPAWN_INDEPENDENT.
There were a lot of clear binary decisions made, and in the
end we decided on a unique solution, which I will write up.
The votes were:
a.
leave current proposal as is: 3
change it: 19
abstain: 2
b.
require ability to spawn MPI processes with >1 proc in MPI_COMM_WORLD: 17
don't: 2
abstain: 5
c.
retain ability to spawn non-MPI processes: 21
don't: 1
a+b+c narrowed the field down to two choices:
group a:
MPI_SPAWN
MPI_SPAWN_INDEPENDENT + MPI/non-MPI flag
group b:
MPI_SPAWN
MPI_SPAWN_INDEPENDENT
MPI_SPAWN_INDEPENDENT_MPI
group a: 5
group b: 16
So I am currently planning to write up a proposal with group b.
We did not get to discuss the following issues on friday:
7. What do we do for MPI_UNIVERSE_SIZE?
There are two proposals. In the current document is a constant,
MPI_UNIVERSE_SIZE. We also discussed a proposal from Rolf
to use instead MPI_UNIVERSE_SIZE(IN info, OUT size).
Note that the constant is what you get (initially, at least) when
you call MPI_UNIVERSE_SIZE(NULL, ...)
Opinions on which way we should go or whether we should punt completely?
Some of the arguments that have been voiced are:
a. the manager/worker example with prereserved resources is so
common that having one of these will greatly improve the portability
of these common programs.
b. The constant version is much more restrictive than necessary.
The function can allow an implementation to provide more information
if it is able to, but doesn't require more information.
c. The function is not flexible enough. For any info != NULL,
the user should be using the runtime environment API.
d. Neither of these belong in MPI. They are related to resource
management.
8. Posix signals in a heterogeneous environment.
(Note: we voted 15-3-7 to go with the POSIX signals when possible
instead of KILL only).
What happens if you are in a heterogeneous environment
and send a posix signal from a posix process to a non-posix
process?
a. returns an error - no signals delivered
b. returns an error - signals delivered where possible
c. silently ignored
d. "people who use heterogeneous systems should suffer"
I will write up a) unless I hear otherwise.
9. MPI_NOTIFY
(Note: we voted 16-2-8 to keep the "what" argument")
(Note: most people agree the name is not right. Suggestions encouraged)
What happens if you post the same event twice?
How do you deal with the race condition between
an event and posting the event?
The current request handler section deals with persistent
requests. Notification requests are not currently persistent.
How should we reconcile these?
What happens if you post the same event twice, but for
different (group,rank) pairs that happen to be the same process?
An idea to solve the first problem:
"There are two classes of events: persistent and
transitory. Process death (MPI_NOTIFY_EXIT) is persistent.
Process suspension, migration, restart are transitory.
If multiple notifications are posted for a single type
of event, all will complete if the event is persistent,
but only one will complete for a single transitory event."
I have an idea to deal with most of these
in one blow, but it involves a rather radical new idea.
For ease of explanation, assume we have an MPI_PROCESS
object. From the time the process is created, reported events
automatically "accumulate" in a queue inside the process
object. We can retrieve the events in the order they
were reported by pulling them off the queue:
MPI_GET_PROCESS_EVENT(IN process, OUT event)
where event can be
MPI_PROCESS_EXITED
MPI_PROCESS_SUSPENDED
MPI_PROCESS_MIGRATED
MPI_NULL_EVENT (if no more events)
etc.
If we want to be informed asynchronously of an event, we
call
MPI_NOTIFY(IN process, OUT request)
The request completes when there is at least one
reported event on the queue. The request is not persistent.
I believe that there are no race conditions with this scenario.
The only issue of posting twice is what happens
if you call notify twice before pulling any events off
the queue. Both requests will complete with the first
event. But the act of pulling an event notification off the queue
is separate from being notified about the event so
everything is well defined.
Note that MPI_PROCESS_STARTED and MPI_PROCESS_FAILED_TO_START
might also be valid "events", making this more robust.
Another related extension is a routine:
MPI_GET_PROCESS_STATE(IN process, OUT state)
where, for example, state is an integer whose bits are boolean
flags.
Or
MPI_GET_PROCESS_INFO(IN process, IN type, OUT value)
The type of information I'm thinking of is:
was the process actually started?
is the process currently running?
is the process currently suspended?
is the proces dead/unreachable?
etc.
This avoids difficulties with the one-time nature of process events.
For instance if a process has exited, we would rather not
have to catch a one-time event.
Do people think these ideas are worth pursuing?
Bill