I believe, that chances for low quality MPI implementation
are minor arguments.
And I believe, that most applications fulfill the "delayed
consistency" because they do not make a "shared memory" style
third-party communication (but this is only a weak opinion).
Is there a chance for higher throughput or lower latency
on some hardware platforms if the MPI implementation knows that
the application fulfills the "delayed consistency" model on
a specific window?
Probably on workstation clusters it can be faster to implement
the target side inside of WINDOW_IN/_OUT if the application uses
only small pieces of data.
In those cases I would prefere an optimization hint saying
that it is enough that a put becomes visible to get operations
only after a call to WINDOW_IN rather than restricting MPI
1-sided to the "delayed consistency" model.
Rolf
Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de