Sorry I did not intent this "overlap".
I want that the application can work instead of to idle waiting until
other processes join a collective call.
> ... Otherwise, we have replaced a blocking collective by two
> collective calls, one of which is blocking, but we don't know which:
> hardly a progress.
Lloyd already pointed out that the idea is, that the start call
is non-blocking, and that only the end-call may block.
(i.e. no step backwards)
> So, does anyone have a suggestion on how to implement (efficiently) a
> nonblocking reduce, so that this holds? The only way I know how to
> do this is for systems with shared memory and with few processes: ...
I believe, implementations have also a good chance to prohibit
unnecesary idles on virtual shared memory systems.
E.g. in ALLTOALL for each pair of processes the data can be always
transferred in both directions by direct (virtual) memory access
after the second process has invoked the start and the end has only
to wait if not all are already started.
And optimization may be different for small and big amount of data.
Rolf
Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de