Re: 2-phase collective

Rolf Rabenseifner (Rabenseifner@RUS.Uni-Stuttgart.DE)
Mon, 3 Feb 1997 16:29:57 +0100 (MEZ)

Marc wrote:
>
> What is the "overlap" argument pushed here? If the argument is that magic
> hardware does the reduce in the bakcground while the main computation
> proceeds in the foreground, ...

Sorry I did not intent this "overlap".
I want that the application can work instead of to idle waiting until
other processes join a collective call.

> ... Otherwise, we have replaced a blocking collective by two
> collective calls, one of which is blocking, but we don't know which:
> hardly a progress.

Lloyd already pointed out that the idea is, that the start call
is non-blocking, and that only the end-call may block.
(i.e. no step backwards)

> So, does anyone have a suggestion on how to implement (efficiently) a
> nonblocking reduce, so that this holds? The only way I know how to
> do this is for systems with shared memory and with few processes: ...

I believe, implementations have also a good chance to prohibit
unnecesary idles on virtual shared memory systems.
E.g. in ALLTOALL for each pair of processes the data can be always
transferred in both directions by direct (virtual) memory access
after the second process has invoked the start and the end has only
to wait if not all are already started.

And optimization may be different for small and big amount of data.

Rolf


Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de