Let's say that we change MPI_RMW() to take both a comm and a rank argument,
just like MPI_WINDOW_LOCK() does, and require that all calls to MPI_RMW()
result in a fence to the target window. Would that solve Karl's problem? Now,
instead of having to wait for a lock, the producer could simply increment the
RMW counter on each of the consumer windows, which would then be checked
locally (also with RMW). Intelligent implementations could optimize RMW calls
such that the fence logic was only performed when needed.
(Marc, would you care to shoot this suggestion down? I don't think that the
current text of the alternate proposal attaches any coherence semantics to
obtaining a lock; it's all defined in terms of releasing locks.)
-- Eric Salo Silicon Graphics Inc. "Do you know what the (415)933-2998 2011 N. Shoreline Blvd, 8U-808 last Xon said, just salo@sgi.com Mountain View, CA 94043-1389 before he died?"