Actually, I was thinking the previous discussion was about the X-bit alignment
of put/get -s. There were comments on how some people didn't want 1-sided at
all but if 1-sided became required, what alignments should be allowed. At least
one person wanted 64-bit alignment only, others wanted byte aligned or whatever.
There were some who said that their hardware wasn't able to perform byte aligned
put/get -s and to require byte alignment would make their implementations slow
and/or nearly impossible. My position was that byte alignment put/get -s can be
done on those systems but the 1-sided communication for alignments not
supported in hardware would become 2-sided operations very similar to
simply doing an MPI_Send. A limitation that put/get -s are allowed only
on certain boundaries would be fine with me but I think that since misaligned
messages are allowed, it is possible to at least simulate misaligned put/get -s
on systems that don't allow certain alignments.
-] > This brings up the point of "Well, let's just limit it to the worst
-] > case alignment among the platforms." This would imply 64-bit
-] > alignment but wait, there are machines that can't do shared
-] > memory put/get -s at all so the worst case is no 1-sided at all.
-] > This approach would mean that 1-sided should actually not be
-] > in the standard.
-]
-] But by that reasoning, it sounds like 2-sided shouldn't be in the standard,
-] since it often requires unaligned transfers.
I wasn't a part of the original MPI meetings but I imagine that point was
discussed. However, misaligned transfers can be easily handled in 2-sided
communications since each side is *supposed* to take an active part in the
communication process. 1-sided communications are *supposed* to require
only one participant (the process doing the actual put or get and not any other
process) and it is much more difficult to handle misaligned transfers in that
way using only the participating process. Put-ing the minimum enclosing
aligned region to another process is completely wrong and Get-ing the minimum
enclosing region from the target then memcpy()ing the desired region isn't
good either. This leaves simulating the operation by message passing behind
the scenes which is a 2-sided operation again and not a 1-sided operation.
-] If a remote machine must perform *any* specific command before accessing the
-] results of a "put" (and it may be required to perform such a command unless
-] there is a very simple and fully consistent memory structure), how is that
-] "put" semantically different from a send, and the specific command different
-] from a receive? Apparently, the difference is the ability for the "putter" to
-] specify where the data should appear in the remote machine (or, similarly,
-] for the "getter" to specify where it obtains the data from the remote machine).
-] Of course, you want some kinds of control over legal "wheres". I don't think
-] 1-sided really has anything to do with the hardware utilized, or the speed of
-] the operation.
I agree, a transfer that requires the participation of the receiving process is
not much different from an MPI_xSend/MPI_xRecv type operation. The only
difference being that the operation goes on "behind the scenes" without the
knowledge of the application programmer as would an MPI_xSend/MPI_xRecv
pair. Also I agree that 1-sided operations can be "simulated" and the syntax
and semantics of a program using 1-sided operations can be done no matter
what hardware or the speed of the hardware that it is sitting upon.
That was the point of one of my earlier posts. The standing argument was that
if these operations are going to be slow on almost all machines, why have them
at all since MPI is supposed to be high performance and these operations will
obviously not be high performance.
-] In MPI, the "where" is specified as a memory address, which (in its current
-] incarnation) might be unaligned. In CDS1, the "where" is specified as a cell
-] id, which is effectively always aligned. (A cell id can be regarded as
-] analogous to a message tag, which can, in some respects, be considered as an
-] abstract address.)
Right. That is one of the reasons why I keep dragging up the MPI_RMA_Malloc
call so that memory areas are specified by a handle instead of an address. A
handle gives much more control over what is going on such as validation of the memory
that the action is going to occur on and any possible mutual exclusion that you wish
to perform.