I don't think that this proposal makes much of a difference form an
implementation viewpoint. Progress issues have to be handled in both
cases. We still have, no matter what we choose, the fundamental
implementation problem that, while blocked in an MPI call, a process must
make progress on all enabled communications, including RMA communications.
We still have our major debate on the meaning of progress. Namely,
whether in a code of the form
process 0 process 1
whether the infinite loop on process 1 also causes process 0 to be stuck.
I still think that this debate is mostly academic (except in the context of
the interaction between MPI and non MPI communication, but, then, we agreed
not to tackle this issue).
Thus, the only valid arguments, one way or another, are efficiency and ease
of use. There is a loss of efficiency when the four functions are pared
down to two, as less computation/communication overlap can be achieved.
There is a gain in avoiding a function call. There are more ways for the
user to deadlock, since the assertion that complete&wait is collective with
other complee&wait calls introduces spurious dependencies. More incorrect
programs and, accordingly, less burden on the implementor.
On balance, I think it is a bad idea to introduce a complete&wait operation
that is collective. The operation is not collective over a predefined
communicator group: it is collective over the connected components of the
communication graph of this epoch. This is hard to understand -- no easier
than the current design. I beleive that the only feasible implementation
is to handle a complete&wait as a complete call, followed by a wait call --
so, no difference wahtsoever in implementation effort. Since this
construct will be used only by power programmers, I see no harm in keeping
it in its full generality.
In any case, I don't see this to be a major issue.