Arrays with subscript triplets describe Fortran subarrays with or without strides, e.g.,
REAL a(100,100,100) CALL MPI_Send( a(11:17, 12:99:3, 1:100), 7*30*100, MPI_REAL, ...)The handling of subscript triplets depends on the value of the constant MPI_SUBARRAYS_SUPPORTED:
Choice buffer arguments are declared as TYPE(*), DIMENSION(..). For example, consider the following code fragment:
REAL s(100), r(100) CALL MPI_Isend(s(1:100:5), 3, MPI_REAL, ..., rq, ierror) CALL MPI_Wait(rq, status, ierror) CALL MPI_Irecv(r(1:100:5), 3, MPI_REAL, ..., rq, ierror) CALL MPI_Wait(rq, status, ierror)In this case, the individual elements s(1), s(6), and s(11) are sent between the start of MPI_ISEND and the end of MPI_WAIT even though the compiled code will not copy s(1:100:5) to a real contiguous temporary scratch buffer. Instead, the compiled code will pass a descriptor to MPI_ISEND that allows MPI to operate directly on s(1), s(6), s(11), ..., s(96). The called MPI_ISEND routine will take only the first three of these elements due to the type signature `` 3, MPI_REAL''.
All nonblocking MPI functions (e.g., MPI_ISEND, MPI_PUT, MPI_FILE_WRITE_ALL_BEGIN) behave as if the user-specified elements of choice buffers are copied to a contiguous scratch buffer in the MPI runtime environment. All datatype descriptions (in the example above, ``3, MPI_REAL'') read and store data from and to this virtual contiguous scratch buffer. Displacements in MPI derived datatypes are relative to the beginning of this virtual contiguous scratch buffer. Upon completion of a nonblocking receive operation (e.g., when MPI_WAIT on a corresponding MPI_Request returns), it is as if the received data has been copied from the virtual contiguous scratch buffer back to the non-contiguous application buffer. In the example above, r(1), r(6), and r(11) are guaranteed to be defined with the received data when MPI_WAIT returns.
Note that the above definition does not supercede restrictions about buffers used with non-blocking operations (e.g., those specified in Section Communication Initiation ).
Advice to implementors.
The Fortran descriptor for TYPE(*), DIMENSION(..) arguments contains enough
information that, if desired, the MPI library can make a real contiguous copy of
non-contiguous user buffers when the nonblocking operation is started,
and release this buffer not before the nonblocking communication
has completed (e.g., the MPI_WAIT routine).
Efficient implementations may avoid such additional
memory-to-memory data copying.
( End of advice to implementors.)
If MPI_SUBARRAYS_SUPPORTED equals .TRUE.,
non-contiguous buffers are handled inside the MPI library
instead of by the compiler through argument association conventions.
Therefore, the scope of MPI library scratch buffers can
be from the beginning of a nonblocking operation until the completion of the
operation although beginning and completion are implemented in different routines.
( End of rationale.)
In this case, the use of Fortran arrays with subscript triplets as actual choice buffer arguments in any nonblocking MPI operation (which also includes persistent request, and split collectives) may cause undefined behavior. They may, however, be used in blocking MPI operations.
Implicit in MPI is the idea of a contiguous chunk of memory accessible through a linear address space. MPI copies data to and from this memory. An MPI program specifies the location of data by providing memory addresses and offsets. In the C language, sequence association rules plus pointers provide all the necessary low-level structure.
In Fortran, array data is not necessarily stored contiguously. For example, the array section A(1:N:2) involves only the elements of A with indices 1, 3, 5, .... The same is true for a pointer array whose target is such a section. Most compilers ensure that an array that is a dummy argument is held in contiguous memory if it is declared with an explicit shape (e.g., B(N)) or is of assumed size (e.g., B(*)). If necessary, they do this by making a copy of the array into contiguous memory.*
Because MPI dummy buffer arguments are assumed-size arrays
if MPI_SUBARRAYS_SUPPORTED equals .FALSE.,
to a serious problem for a nonblocking call: the compiler copies the
temporary array back on return but MPI continues to copy data to the
memory that held it. For example, consider the following code fragment:
real a(100) call MPI_IRECV(a(1:100:2), MPI_REAL, 50, ...)Since the first dummy argument to MPI_IRECV is an assumed-size array (<type> buf(*)), the array section a(1:100:2) is copied to a temporary before being passed to MPI_IRECV, so that it is contiguous in memory. MPI_IRECV returns immediately, and data is copied from the temporary back into the array a. Sometime later, MPI may write to the address of the deallocated temporary. Copying is also a problem for MPI_ISEND since the temporary array may be deallocated before the data has all been sent from it.
Most Fortran 90 compilers do not make a copy if the actual argument is the whole of an explicit-shape or assumed-size array or is a ``simply contiguous'' section such as A(1:N) of such an array. (``Simply contiguous'' is defined in the next paragraph.) Also, many compilers treat allocatable arrays the same as they treat explicit-shape arrays in this regard (though we know of one that does not). However, the same is not true for assumed-shape and pointer arrays; since they may be discontiguous, copying is often done. It is this copying that causes problems for MPI as described in the previous paragraph.
According to the Fortran 2008 Standard, Section 6.5.4, a ``simply contiguous'' array section is
name ( [:,]... [<subscript>]:[<subscript>] [,<subscript>]... )That is, there are zero or more dimensions that are selected in full, then one dimension selected without a stride, then zero or more dimensions that are selected with a simple subscript. The compiler can detect from analyzing the source code that the array is contiguous. Examples are
A(1:N), A(:,N), A(:,1:N,1), A(1:6,N), A(:,:,1:N)Because of Fortran's column-major ordering, where the first index varies fastest, a ``simply contiguous'' section of a contiguous array will also be contiguous.
The same problem can occur with a scalar argument.
A compiler may
make a copy of scalar dummy arguments within
a called procedure when passed as an actual argument to a choice buffer routine.
That this can cause a problem is illustrated by the
real :: a call user1(a,rq) call MPI_WAIT(rq,status,ierr) write (*,*) a subroutine user1(buf,request) call MPI_IRECV(buf,...,request,...) endIf a is copied, MPI_IRECV will alter the copy when it completes the communication and will not alter a itself.
Note that copying will almost certainly occur for an argument that is a non-trivial expression (one with at least one operator or function call), a section that does not select a contiguous part of its parent (e.g., A(1:n:2)), a pointer whose target is such a section, or an assumed-shape array that is (directly or indirectly) associated with such a section.
If a compiler option exists that inhibits copying of arguments, in either the calling or called procedure, this must be employed.
If a compiler makes copies in the calling procedure of arguments that are explicit-shape or assumed-size arrays, ``simply contiguous'' array sections of such arrays, or scalars, and if no compiler option exists to inhibit such copying, then the compiler cannot be used for applications that use MPI_GET_ADDRESS, or any nonblocking MPI routine. If a compiler copies scalar arguments in the called procedure and there is no compiler option to inhibit this, then this compiler cannot be used for applications that use memory references across subroutine calls as in the example above.