One can improve performance on many systems by overlapping
communication and computation. This is especially true on systems
where communication can be executed autonomously by an intelligent
communication controller. Light-weight threads are one mechanism for
achieving such overlap. An alternative mechanism that often leads to
better performance is to use nonblocking communication. A
nonblocking send start call initiates the send operation, but does not
complete it. The send start call
can
return before the message was copied out of the send buffer.
A separate send complete
call is needed to complete the communication, i.e., to verify that the
data has been copied out of the send buffer. With
suitable hardware, the transfer of data out of the sender memory
may proceed concurrently with computations done at the sender after
the send was initiated and before it completed.
Similarly, a nonblocking receive start call initiates the receive
operation, but does not complete it. The call
can
return before
a message is stored into the receive buffer. A separate receive
complete call
is needed to complete the receive operation and verify that the data has
been received into the receive buffer.
With suitable hardware, the transfer of data into the receiver memory
may proceed concurrently with computations done after the receive was
initiated and before it completed. The use of nonblocking receives may also
avoid system buffering and memory-to-memory copying, as information is provided
early on the location of the receive buffer.
Nonblocking send start calls can use the same four modes as blocking sends:
standard, buffered, synchronous and ready. These carry
the same meaning.
Sends of all modes, ready excepted, can be started whether a matching
receive has been posted or not; a nonblocking ready
send can be started only if
a matching receive is posted. In all cases, the send start call
is local: it returns immediately, irrespective of the
status of other processes.
If the call causes some system resource to be exhausted, then it will
fail and return an error code. Quality
implementations of MPI should ensure that this happens only
in ``pathological'' cases. That is, an MPI implementation
should be able to support a
large number of pending nonblocking operations.
The send-complete call returns when data has been copied out of the
send buffer.
It may carry additional meaning, depending on the send mode.
If the send mode is synchronous, then the
send can complete only if a matching receive has started. That
is, a receive has
been posted, and has been matched with the send. In this case,
the send-complete call is non-local. Note that a synchronous,
nonblocking send may complete, if matched by a nonblocking receive, before
the receive complete call occurs. (It can complete as soon as the sender
``knows'' the transfer will complete, but before the receiver ``knows'' the
transfer will complete.)
If the send mode is buffered then the
message must be buffered if there is no pending receive. In this case,
the send-complete
call is local, and must succeed irrespective of the status of a matching
receive.
If the send mode is standard then the send-complete call may
return before a matching receive
is posted,
if the message is buffered. On the other hand, the
send-complete may not complete until a matching receive
is posted,
and the message was copied into the receive buffer.
Nonblocking sends can be matched with blocking receives, and
vice-versa.
The completion of a send operation may be delayed, for standard mode, and must
be delayed, for synchronous mode, until a matching receive is posted.
The use of nonblocking sends in these two cases allows the sender to proceed
ahead of the receiver, so that the computation is more tolerant
of fluctuations in the speeds of the two processes.
Nonblocking sends in the buffered and ready modes have a more
limited impact. A nonblocking send will return as soon as possible, whereas a blocking
send will return after the data has been copied out of the sender memory.
The use of nonblocking sends is advantageous in these cases only if
data copying
can be concurrent with computation.
The message-passing model implies that communication is initiated by
the sender.
The communication will generally have lower overhead if a receive is
already posted when the sender initiates the communication (data can be moved
directly to the receive buffer, and there is no need to queue a pending send
request). However,
a receive operation can complete only after the matching send has occurred.
The use of nonblocking receives allows one to achieve lower communication overheads
without blocking the receiver while it waits for the send.
( End of advice to users.)
Advice to users.
![]()
![]()
![]()
Up: Contents
Next: Communication Request Objects
Previous: Model Implementation of Buffered Mode
Return to MPI-2.1 Standard Index
Return to MPI Forum Home Page
MPI-2.0 of July 1, 2008
HTML Generated on July 6, 2008