


Nonblocking communication is important both for reasons of correctness and performance. For complex communication patterns, the use of only blocking communication (without buffering) is difficult because the programmer must ensure that each send is matched with a receive in an order that avoids deadlock. For communication patterns that are determined only at run time, this is even more difficult. Nonblocking communication can be used to avoid this problem, allowing programmers to express complex and possibly dynamic communication patterns without needing to ensure that all sends and receives are issued in an order that prevents deadlock (see Section Semantics of Point-to-Point Communication and the discussion of ``safe'' programs). Nonblocking communication also allows for the overlap of communication with different communication operations, e.g., to prevent the unintentional serialization of such operations, and for the overlap of communication with computation. Whether an implementation is able to accomplish an effective (from a performance standpoint) overlap of operations depends on the implementation itself and the system on which the implementation is running. Using nonblocking operations permits an implementation to overlap communication with computation, but does not require it to do so.
A nonblocking send start call initiates the send operation, but does not complete it. The send start call can return before the message was copied out of the send buffer. A separate send complete call is needed to complete the communication, i.e., to verify that the data has been copied out of the send buffer. With suitable hardware, the transfer of data out of the sender memory may proceed concurrently with computations done at the sender after the send was initiated and before it completed. Similarly, a nonblocking receive start call initiates the receive operation, but does not complete it. The call can return before a message is stored into the receive buffer. A separate receive complete call is needed to complete the receive operation and verify that the data has been received into the receive buffer. With suitable hardware, the transfer of data into the receiver memory may proceed concurrently with computations done after the receive was initiated and before it completed. The use of nonblocking receives may also avoid system buffering and memory-to-memory copying, as information is provided early on the location of the receive buffer.
Nonblocking send start calls can use the same four modes as blocking sends: standard, buffered, synchronous, and ready. These carry the same meaning.Sends of all modes, ready excepted, can be started whether a matching receive has been started or not; a nonblocking ready send can be started only if the matching receive is already started. In all cases, the send start call is local: it returns immediately, irrespective of the status of other MPI processes. If the call causes some system resource to be exhausted, then it will fail and return an error code. High-quality implementations of MPI should ensure that this happens only in ``pathological'' cases. That is, an MPI implementation should be able to support a large number of pending nonblocking operations.
The send-complete call returns no earlier than when all message data has been copied out of the send buffer. It may carry additional meaning, depending on the send mode.
If the send mode is synchronous, then the send-complete call is nonlocal; the send can complete only if a matching receive has been started and has been matched with the send. Note that a synchronous mode send may complete, if matched by a nonblocking receive, before the receive complete call occurs. (It can complete as soon as the sender ``knows'' the transfer will complete, but before the receiver ``knows'' the transfer will complete.)
If the send mode is buffered, then the send-complete call is local; the send must complete irrespective of the status of a matching receive. If there is no pending receive operation, then the message must be buffered.
If the send mode is standard, then the send-complete call can be either local or nonlocal. If the message is buffered, it is permitted for the send to complete before a matching receive is started. On the other hand, it is permitted for the send not to complete until a matching receive has been started and the message has been copied into the receive buffer.
Nonblocking sends can be matched with blocking receives, and vice-versa.
 
 
 
 Advice to users.  
The completion of a send operation may be delayed for standard mode, and must be delayed for synchronous mode, until a matching receive has been started. The use of nonblocking sends in these two cases allows the sender to proceed ahead of the receiver, so that the computation is more tolerant of fluctuations in the speeds of the two MPI processes.
Nonblocking sends in the buffered and ready modes have a more limited impact, e.g., the blocking version of buffered send is capable of completing regardless of when a matching receive call is made. However, separating the start from the completion of these sends still gives some opportunity for optimization within the MPI library. For example, starting a buffered send gives an implementation more flexibility in determining if and how the message is buffered. There are also advantages for both nonblocking buffered and ready modes when data copying can be done concurrently with computation.
 
The message-passing model implies that communication is initiated by  
the sender.  
The communication will generally have lower overhead if a receive is  
already  started when the sender initiates the communication (data can be moved  
directly to the receive buffer, and there is no need to queue a pending send  
request).  However,  
a receive operation can complete only after the matching send has  started.  
The use of nonblocking receives allows one to achieve lower communication overheads  
without blocking the receiver while it waits for the send.  
 ( End of advice to users.) 
 


