I'll note that there may be no incoming message involved in completing the
request; it may simply be a SIG_IO or SIG_USR1 or SIG_CHLD or other non-MPI
operation. The only MPI call that the MPI_WAIT implementation can depend on
happening is MPI_REQUEST_MARK_COMPLETED.
Now, I'll agree that a possible implementation could use some kind of
pend-semaphore and then the MPI_WAIT, once detecting a generalized request,
could block at this point, but this would both be complex, expensive, and
impact the performance of any nonblocking operation, since the multiple
completion WAIT/TEST might be in use. The most obvious implementation, and
the one with the least impact when generalized requests are not used, is to
spin on a completion flag in the generalized request. (There are other
possible implementations, but they are at least as hideous. A "select" based
system could have MPI_REQUEST_MARK_COMPLETED write data to an internal fd to
cause the select to complete, for example. My concern is that it is hard to
guarentee that this will not impact an implementation.)
If you take the position (not in the standard) that a generalized request
completes in an MPI handler routine, such as in the example, then the MPI_WAIT
can block, since it must wait for some MPI operation to complete. But if this
restriction is made, I'm not sure that the generalized requests are
appropriate for things like MPI-IO, where the actual operation may not involve
MPI at all.
Bill