MPI includes a variant of the reduce operations where the result is scattered to all processes in a group on return.
|MPI_REDUCE_SCATTER( sendbuf, recvbuf, recvcounts, datatype, op, comm)|
|IN sendbuf||starting address of send buffer (choice)|
|OUT recvbuf||starting address of receive buffer (choice)|
|IN recvcounts||non-negative integer array specifying the number of elements in result distributed to each process. Array must be identical on all calling processes.|
|IN datatype||data type of elements of input buffer (handle)|
|IN op||operation (handle)|
|IN comm||communicator (handle)|
int MPI_Reduce_scatter(void* sendbuf, void* recvbuf, int *recvcounts, MPI_Datatype datatype, MPI_Op op, MPI_Comm comm)
MPI_REDUCE_SCATTER(SENDBUF, RECVBUF, RECVCOUNTS, DATATYPE, OP, COMM, IERROR)
<type> SENDBUF(*), RECVBUF(*)
INTEGER RECVCOUNTS(*), DATATYPE, OP, COMM, IERROR
void MPI::Comm::Reduce_scatter(const void* sendbuf, void* recvbuf, int recvcounts, const MPI::Datatype& datatype, const MPI::Op& op) const = 0
If comm is an intracommunicator, MPI_REDUCE_SCATTER first does an element-wise reduction on vector of elements in the send buffer defined by sendbuf, count and datatype. Next, the resulting vector of results is split into n disjoint segments, where n is the number of members in the group. Segment i contains recvcounts[i] elements. The i-th segment is sent to process i and stored in the receive buffer defined by recvbuf, recvcounts[i] and datatype.
Advice to implementors.
routine is functionally equivalent to:
with count equal to
the sum of recvcounts[i] followed by
MPI_SCATTERV with sendcounts equal to recvcounts.
However, a direct implementation may run faster.
( End of advice to implementors.)
The ``in place'' option for intracommunicators is specified by passing MPI_IN_PLACE in the sendbuf argument. In this case, the input data is taken from the top of the receive buffer. If comm is an intercommunicator, then the result of the reduction of the data provided by processes in group A is scattered among processes in group B, and vice versa. Within each group, all processes provide the same recvcounts argument, and the sum of the recvcounts entries should be the same for the two groups.
The last restriction is needed so that the length of the send
buffer can be determined by the sum of the local recvcounts entries.
Otherwise, a communication is needed to figure out how many elements
( End of rationale.)