Thanks for introducing the Allput pattern to the discusssion. The
Allput is the key communication pattern in every program generated by
our Fx compiler (Fx is a dialect of HPF that integrates task and data
parallelism. See http://www.cs.cmu.edu/~fx for more info and example
codes). The Allput is the technique we use for compiling array
assignment statements, and thus could be important for any HPF compiler
that targets MPI.
We have measured big performance improvements over sends and receives
using the Allput pattern (we call it direct deposit message passing).
The fundamental reason for the performance improvement is that the
Allput separates synchronization and data transfer, which allows us then
to replace p^2 potential synchronizations (e.g. one synch for each
message in a complete exchange induced by assigning a BLOCK distributed
HPF array to a CYCLIC distributed array) with a single logn barrier. The
idea is discussed in more detail (along with measured T3D numbers) in
T. Stricker, J. Stichnoth, D. O'Hallaron, S. Hinrichs, and T. Gross.
Decoupling Synchronization and Data Transfer in Message Passing
Systems of Parallel Computers. In Proc. of the 9th International
Conference on Supercomputing, ACM, Barcelona, Spain, July, 1995.
The postscript is available from the Fx Web page at
Dave O'Hallaron