V. Bala and S. Kipnis. Process groups: a mechanism for the coordination of and communication among processes in the Venus collective communication library. Technical report, IBM T. J. Watson Research Center, October 1992. Preprint.
 V. Bala, S. Kipnis, L. Rudolph, and Marc Snir. Designing efficient, scalable, and portable collective communication libraries. Technical report, IBM T. J. Watson Research Center, October 1992. Preprint.
 Purushotham V. Bangalore, Nathan E. Doss, and Anthony Skjellum. MPI++: Issues and Features. In OON-SKI '94, page in press, 1994.
 A. Beguelin, J. Dongarra, A. Geist, R. Manchek, and V. Sunderam. Visualization and debugging in a heterogeneous environment. IEEE Computer, 26(6):88--95, June 1993.
 Luc Bomans and Rolf Hempel. The Argonne/GMD macros in FORTRAN for portable parallel programming and their implementation on the Intel iPSC/2. Parallel Computing, 15:119--132, 1990.
 Dan Bonachea and Jason Duell. Problems with using MPI 1.1 and 2.0 as compilation targets for parallel language implementations. IJHPCN, 1(1/2/3):91--99, 2004.
 Rajesh Bordawekar, Juan Miguel del Rosario, and Alok Choudhary. Design and evaluation of primitives for parallel I/O. In Proceedings of Supercomputing '93, pages 452--461, 1993.
 R. Butler and E. Lusk. User's guide to the p4 programming system. Technical Report TM-ANL--92/17, Argonne National Laboratory, 1992.
 Ralph Butler and Ewing Lusk. Monitors, messages, and clusters: The p4 parallel programming system. Parallel Computing, 20(4):547--564, April 1994. Also Argonne National Laboratory Mathematics and Computer Science Division preprint P362-0493.
 Robin Calkin, Rolf Hempel, Hans-Christian Hoppe, and Peter Wypior. Portable programming with the PARMACS message-passing library. Parallel Computing, 20(4):615--632, April 1994.
 S. Chittor and R. J. Enbody. Performance evaluation of mesh-connected wormhole-routed networks for interprocessor communication in multicomputers. In Proceedings of the 1990 Supercomputing Conference, pages 647--656, 1990.
 S. Chittor and R. J. Enbody. Predicting the effect of mapping on the communication performance of large multicomputers. In Proceedings of the 1991 International Conference on Parallel Processing, vol. II (Software), pages II--1 -- II--4, 1991.
 Parasoft Corporation. Express version 1.0: A communication environment for parallel computers, 1988.
 Yiannis Cotronis, Anthony Danalis, Dimitrios S. Nikolopoulos, and Jack Dongarra, editors. Recent Advances in the Message Passing Interface - 18th European MPI Users' Group Meeting, EuroMPI 2011, Santorini, Greece, September 18-21, 2011. Proceedings, volume 6960 of Lecture Notes in Computer Science. Springer, 2011.
 Juan Miguel del Rosario, Rajesh Bordawekar, and Alok Choudhary. Improved parallel I/O via a two-phase run-time access strategy. In IPPS '93 Workshop on Input/Output in Parallel Computer Systems, pages 56--70, 1993. Also published in Computer Architecture News 21(5), December 1993, pages 31--38.
 James Dinan, Sriram Krishnamoorthy, Pavan Balaji, Jeff R. Hammond, Manojkumar Krishnan, Vinod Tipparaju, and Abhinav Vishnu. Noncollective communicator creation in MPI. In Cotronis et al. , pages 282--291.
 J. Dongarra, A. Geist, R. Manchek, and V. Sunderam. Integrated PVM framework supports heterogeneous network computing. Computers in Physics, 7(2):166--75, April 1993.
 J. J. Dongarra, R. Hempel, A. J. G. Hey, and D. W. Walker. A proposal for a user-level, message passing interface in a distributed memory environment. Technical Report TM-12231, Oak Ridge National Laboratory, February 1993.
 Edinburgh Parallel Computing Centre, University of Edinburgh. CHIMP Concepts, June 1991.
 Edinburgh Parallel Computing Centre, University of Edinburgh. CHIMP Version 1.0 Interface, May 1992.
 D. Feitelson. Communicators: Object-based multiparty interactions for parallel programming. Technical Report 91-12, Dept. Computer Science, The Hebrew University of Jerusalem, November 1991.
 Message Passing Interface Forum. MPI: A Message-Passing Interface standard. The International Journal of Supercomputer Applications and High Performance Computing, 8, 1994.
 Message Passing Interface Forum. MPI: A Message-Passing Interface standard (version 1.1). Technical report, 1995. http://www.mpi-forum.org.
 Al Geist, Adam Beguelin, Jack Dongarra, Weicheng Jiang, Bob Manchek, and Vaidy Sunderam. PVM: Parallel Virtual Machine---A User's Guide and Tutorial for Network Parallel Computing. MIT Press, 1994.
 G. A. Geist, M. T. Heath, B. W. Peyton, and P. H. Worley. PICL: A portable instrumented communications library, C reference manual. Technical Report TM-11130, Oak Ridge National Laboratory, Oak Ridge, TN, July 1990.
 D. Gregor, T. Hoefler, B. Barrett, and A. Lumsdaine. Fixing probe for multi-threaded MPI applications. Technical Report 674, Indiana University, Jan. 2009.
 William D. Gropp and Barry Smith. Chameleon parallel programming tools users manual. Technical Report ANL-93/23, Argonne National Laboratory, March 1993.
 Michael Hennecke. A Fortran 90 interface to MPI version 1.1. Technical Report Internal Report 63/96, Rechenzentrum, Universität Karlsruhe, D-76128 Karlsruhe, Germany, June 1996. Available via world wide web from http://www.uni-karlsruhe.de/\ Michael.Hennecke/Publications/#MPI_F90.
 T. Hoefler, G. Bronevetsky, B. Barrett, B. R. de Supinski, and A. Lumsdaine. Efficient MPI support for advanced hybrid programming models. In Recent Advances in the Message Passing Interface (EuroMPI'10), volume LNCS 6305, pages 50--61. Springer, Sep. 2010.
 T. Hoefler, P. Gottschling, A. Lumsdaine, and W. Rehm. Optimizing a conjugate gradient solver with non-blocking collective operations. Elsevier Journal of Parallel Computing (PARCO), 33(9):624--633, Sep. 2007.
 T. Hoefler, F. Lorenzen, and A. Lumsdaine. Sparse non-blocking collectives in quantum mechanical calculations. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, volume LNCS 5205, pages 55--63. Springer, Sep. 2008.
 T. Hoefler and A. Lumsdaine. Message progression in parallel computing --- to thread or not to thread? In Proceedings of the 2008 IEEE International Conference on Cluster Computing. IEEE Computer Society, Oct. 2008.
 T. Hoefler, A. Lumsdaine, and W. Rehm. Implementation and performance analysis of non-blocking collective operations for MPI. In Proceedings of the 2007 International Conference on High Performance Computing, Networking, Storage and Analysis, SC07. IEEE Computer Society/ACM, Nov. 2007.
 T. Hoefler, M. Schellmann, S. Gorlatch, and A. Lumsdaine. Communication optimization for medical image reconstruction algorithms. In Recent Advances in Parallel Virtual Machine and Message Passing Interface, 15th European PVM/MPI Users' Group Meeting, volume LNCS 5205, pages 75--83. Springer, Sep. 2008.
 T. Hoefler and J. L. Traeff. Sparse collective operations for MPI. In Proceedings of the 23rd IEEE International Parallel & Distributed Processing Symposium, HIPS'09 Workshop, May 2009.
 Torsten Hoefler and Marc Snir. Writing parallel libraries with MPI --- common practice, issues, and extensions. In Cotronis et al. , pages 345--355.
 Institute of Electrical and Electronics Engineers, New York. IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Standard 754-1985, 1985.
 International Organization for Standardization, Geneva, ISO 8859-1:1987. Information processing --- 8-bit single-byte coded graphic character sets --- Part 1: Latin alphabet No. 1, 1987.
 International Organization for Standardization, Geneva, ISO/IEC 9945-1:1996(E). Information technology --- Portable Operating System Interface (POSIX) --- Part 1: System Application Program Interface (API) [C Language], December 1996.
 International Organization for Standardization, Geneva, ISO/IEC 1539-1:2010. Information technology -- Programming languages -- Fortran -- Part 1: Base language, November 2010.
 International Organization for Standardization, ISO/IEC/SC22/WG5 (Fortran), Geneva, TS 29113. TS on further interoperability with C, 2012. http://www.nag.co.uk/sc22wg5/, successfully balloted DTS at ftp://ftp.nag.co.uk/sc22wg5/N1901-N1950/N1917.pdf.
 Charles H. Koelbel, David B. Loveman, Robert S. Schreiber, Guy L. Steele Jr., and Mary E. Zosel. The High Performance Fortran Handbook. MIT Press, 1993.
 David Kotz. Disk-directed I/O for MIMD multiprocessors. In Proceedings of the 1994 Symposium on Operating Systems Design and Implementation, pages 61--74, November 1994. Updated as Dartmouth TR PCS-TR94-226 on November 8, 1994.
 O. Krämer and H. Mühlenbein. Mapping strategies in message-based multiprocessor systems. Parallel Computing, 9:213--225, 1989.
 S. J. Lefflet, R. S. Fabry, W. N. Joy, P. Lapsley, S. Miller, and C. Torek. An advanced 4.4BSD interprocess communication tutorial, Unix programmer's supplementary documents (PSD) 21. Technical report, Computer Systems Research Group, Depertment of Electrical Engineering and Computer Science, University of California, Berkeley, 1993. Also available at http://www.netbsd.org/Documentation/lite2/psd/.
 nCUBE Corporation. nCUBE 2 Programmers Guide, r2.0, December 1990.
 Bill Nitzberg. Performance of the iPSC/860 Concurrent File System. Technical Report RND-92-020, NAS Systems Division, NASA Ames, December 1992.
 William J. Nitzberg. Collective Parallel I/O. PhD thesis, Department of Computer and Information Science, University of Oregon, December 1995.
 4.4BSD Programmer's Supplementary Documents (PSD). O'Reilly and Associates, 1994.
 Paul Pierce. The NX/2 operating system. In Proceedings of the Third Conference on Hypercube Concurrent Computers and Applications, pages 384--390. ACM Press, 1988.
 Martin Schulz and Bronis R. de Supinski. PNMPI tools: A whole lot greater than the sum of their parts. In ACM/IEEE Supercomputing Conference (SC), pages 1--10. ACM, 2007.
 K. E. Seamons, Y. Chen, P. Jones, J. Jozwiak, and M. Winslett. Server-directed collective I/O in Panda. In Proceedings of Supercomputing '95, December 1995.
 A. Skjellum and A. Leung. Zipcode: a portable multicomputer communication library atop the reactive kernel. In D. W. Walker and Q. F. Stout, editors, Proceedings of the Fifth Distributed Memory Concurrent Computing Conference, pages 767--776. IEEE Press, 1990.
 A. Skjellum, S. Smith, C. Still, A. Leung, and M. Morari. The Zipcode message passing system. Technical report, Lawrence Livermore National Laboratory, September 1992.
 Anthony Skjellum, Nathan E. Doss, and Purushotham V. Bangalore. Writing Libraries in MPI. In Anthony Skjellum and Donna S. Reese, editors, Proceedings of the Scalable Parallel Libraries Conference, pages 166--173. IEEE Computer Society Press, October 1993.
 Anthony Skjellum, Nathan E. Doss, and Kishore Viswanathan. Inter-communicator extensions to MPI in the MPIX (MPI eXtension) Library. Technical Report MSU-940722, Mississippi State University --- Dept. of Computer Science, April 1994. http://www.erc.msstate.edu/mpi/mpix.html.
 Anthony Skjellum, Steven G. Smith, Nathan E. Doss, Alvin P. Leung, and Manfred Morari. The Design and Evolution of Zipcode. Parallel Computing, 20(4):565--596, April 1994.
 Rajeev Thakur and Alok Choudhary. An extended two-phase method for accessing sections of out-of-core arrays. Scientific Programming, 5(4):301--317, Winter 1996.
 The Unicode Standard, Version 2.0. Addison-Wesley, 1996. ISBN 0-201-48345-9.
 D. Walker. Standards for message passing in a distributed memory environment. Technical Report TM-12147, Oak Ridge National Laboratory, August 1992.