Here a simple example:
loop
loop PUT
PUT loop GET
PUT GET PUT
GET PUT
PUT WIN_BARRIER_START
WIN_BARRIER_START
WIN_BARRIER_START some local ...
some local ... ... computation
... computation some local ...
... computation WIN_BARRIER_END
WIN_BARRIER_END
WIN_BARRIER_END using new data ...
using new data ... ... in the window
... in the window using new data ... end_loop
end_loop ... in the window
end_loop
This can reduce the execution time of applications that are
load balanced in the average over all iterations, but not in each
iteration (the same as for 2-phase ALLREDUCE and BCAST).
Should we add this?
What is the exact semantics?
Rolf
Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de