Looking at FE applications I believe that your proposal is worse
than the current official draft:
For applications that should not be executed in strongly synchronized
steps (i.e. with MPI_BARRIER), I have compared
- PUT & counters with the current official draft, and
- PUT & LOCK/UNLOCK with the radical new draft.
Assuming that the counter is implemented by a write lock,
a get and update and an unlock, and assuming that a fence is
similar expensive as the update of the counter,
then the number of operations is nearly identical
(except the write of the dirty-bit and the
window_out inside of the local unlock in the 'radical new draft').
The main difference is, that in the 'radical new draft' during the PUT
the whole window is locked!!!
If there are a lot of processes writing to different areas in
the target windows then the 'radical new draft' is worse.
And MPI-2 should address 'a lot of processes'!!!
In the moment all alternative proposals are less efficient because
they are blocking the application more than necessary
(because MPI_BARRIER is the olnly way or because the put must
be done in a locked manner).
I understand the vendors that it is more work to look at their
cache modell and to look how MPI_WINDOW_IN/OUT must be implemented
on their hardware than to implement only a less efficient but
more simple modell -- but MPI-2 should allow efficient
parallel programming.
Rolf
Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de