--- Your Message
The cost is never basically zero. For the implementor there may be relatively
little costs to these irrelevant functions to prevent linker errors. However
we cannot ignore the cost of documentation and evaluation. This are significant
The standard as resulting from the MPI-Forum is poor documentation, rarely
even clearing indicate possible return values of functions.
And I have not met an Evaluator yet who believes me when I say a function is
a no-op. Take a function like MPI_Window_in that was first advertise as
basically a no-op on many systems. Some evaluators would interpret (page 94,
line 21) "but the local copy of the window should not be updated locally" to
imply that a high quality implementation should prevent a local update.
**** This interpretation is wrong. If the current text is still misleading,
please let me know.
Also (page 91, line 46,47) "Before using loads and stores on the target process
access data updated by a prior RMA operation, the target process 'must' first
issue a call to MPI_WINDOW_IN", would surely be interpreted by most evaluators
as some kind of error needs to be generated if this is not the case.
**** I put text making cleat that this error is unlikely to be detected. Same
remark occurs at several places in the MPI1 text. In general, erroneous code
leads to undefined behavior, and errors that have to do with ordering of events
(intitiated by different processes) are certainly the kind of errors that are
not normally detected-- if we get closer to shared memory behavior, then we
also get closer to shared memory bugs...
Please let me know if this is sufficient.
And what exactly are the possible return values of MPI_WINDOW_IN???
**** MPI_SUCCESS, of course (unless the comm argument is wrong). There is a
proposal to make these functions inlineable, in which case they would not
return any error code, and would truely be a noop where feasible. We can
discuss this next meeting. On the other hand, I am not sure that a good
Paragon implementation will want to have noop WINDOW_IN and WINDOW_OUT
functions. These functions could be used to copy data from an area that is
shared by application and communication process to/from the application process
Portable evaluation suite writers might spend a great deal of time trying
to evaluate these meaningfully.
**** Well, Darpa pays Intel to do this, isn't it?
The implementation notes in the new draft makes it clear that the difference in
performance on, say, an SGI machine between put/get in special, shared memory,
and RMA in regular memory is the difference between one memory-to-memory copy
and two memory-to-memory copy. This is feasible because of the use of
WINDOW_IN and WINDOW_OUT, and without compromising function or performance
otherwise. (On a hippi connected cluster and additional memory-to-memory copy
may be needed.)
Also, the implementation strategy becomes much more uniform -- it's always
memory-to-memory copy. Same remark applies to other partial shared-memory
machines. I think this makes the current design much more pleasant.