Lloyd already pointed out that the idea is, that the start call
is non-blocking, and that only the end-call may block.
(i.e. no step backwards)
****
My argument is that the 2-phase collective construct is interesting only if
it can be implemented so that both start and end are nonblocking. If the
reality of implementations will be that always either the start or the end
blocks, but the user does not know which blocks, then there will be no way
of using such construct more effiiently than a blocking collective.
******
> So, does anyone have a suggestion on how to implement (efficiently) a
> nonblocking reduce, so that this holds? The only way I know how to
> do this is for systems with shared memory and with few processes: ...
I believe, implementations have also a good chance to prohibit
unnecesary idles on virtual shared memory systems.
E.g. in ALLTOALL for each pair of processes the data can be always
transferred in both directions by direct (virtual) memory access
after the second process has invoked the start and the end has only
to wait if not all are already started.
*****
If the only example of a nonblocking implementation is DSVM, then I don't
buy the construct.
Marc
*******