During the June 5-7 MPI Forum meeting, an overview of the
proposed I/O chapter was presented to the entire forum,
and the I/O committee met for 4 break-out sessions.
Each break-out session was attended by approximately 20 people;
participating was:
Greg Astfalk, HP/Convex
Eric Brunner, Thinking Machines
Margaret Cahir, Cray Research
Pang Chen, Sandia
Ying Chen, University of Illinois
Albert Cheng, NCSA
Lyndon Clarke, University of Edinburgh
Richard Frost, SDSC
Leslie Hart, NOAA/FSL
Steve Huss-Lederman, University of Wisconsin
Arkady Kanevsky
Susan Kraus, NEC
Steve Landherr, HP/Convex
Jarek Nieplocha
Bill Nitzberg, NASA Ames
Ron Oldfield, Sandia
Yoonho Park, University of Huston
Elsie Pierce, LLNL
Jean-Pierre Prost, IBM
Joe Rieken, Los Alamos
Jeff Squyres, Notre Dame
Rajeev Thakur, Argonne
Manuel Ujaldon, University of Maryland
Klaus Wolf, GMD
Parkson Wong, NASA Ames
David Wright, Pratt & Whitney
Decisions:
----------
Note that only items 10 and 11 are changes to the specification,
although other decisions will likely result in numerous proposals.
1. Keep OPEN/CLOSE/READ/WRITE interface (a la MPI-IO version 0.5)
The committee agreed that the UNIX-like approach was the
right direction, and was not interested in developing
a totally different interface (e.g. memory mapped files,
or persistent objects).
2. No subset
It was suggested that we might be able to have a core subset
completed and approved by SC'96, and work on enhancements
later (perhaps as part of MPI-3). However, after considerable
debate, the committee decided that it is much more important
to do it right than to do it fast, and agreed (unanimously)
that there would be no subset.
3. Goal: Support the I/O needs of MPI programs
The goal of the MPI-IO version 0.5 interface was to support
the needs of parallel scientific programmers. The committee
felt that the goal of the MPI-2 I/O interface should be to
simply support MPI programmers. Further, the I/O chapter will
inherit all of the goals and limitations of MPI-2.
4. Layerability (11/4/4 in favor of ensuring it is possible)
It is the intention of the I/O committee to ensure that it
is possible (although not required) to implement the I/O
routines as a layered library on top of the rest of MPI-2.
However, it is understood that this library may require
threads in order to be portable.
5. I/O Capabilities
In order to gauge the support for the different features and
capabilities in the current proposal, individual features
were listed and reviewed. A show-of-hands vote was taken
for removing each existing feature and adding each new feature
listed. (Negative numbers represent the number of people in
favor of removing an existing feature; positive numbers indicate
the number of people interested in adding a new feature):
-0 Open/close/read/write style interface
-0 Support for UNIX semantics
-0 Non-blocking operations
-0 Collective operations
-0 Independent operations
-0 File control facilities
-0 Explicit offset operations
-0 File pointers (individual)
-6 File pointers (shared)
-0 Data partitioning via MPI datatypes
-0 Memory scatter/gather support for data access routines
-0 Hints
+0 Directives (instead of hints)
-3 Filetype constructors
+2 Third party transfer operations
+11 Support for local & remote I/O services
-6 Data partitioning specified at open time
+7 Data partitioning specified at read/write time
+4 Explicit data partitioning and access routines
6. File Interoperability
There was strong support for adding some level of file
interoperability into the specification. However, a concensus
was not reached on exactly how much interoperability to support,
or even what is meant by interoperability. Expect to see
numerous proposals and significant debate.
7. Shared file pointers (9/9/1 in favor of keeping)
Shared file pointers were called out as a possible feature we
might want to eliminate. We discussed two uses of shared file
pointers: work sharing and log files. In work sharing, the
order of access to a file is unimportant, but each file datum
must be acted on exactly once (e.g. processes independently
read the "next piece of work" or compute and write the next
piece of tagged compressed data).
It was suggested that a UNIX-like APPEND mode would solve the
problem of supporting log files, but would not support work
sharing.
Initially, the committee was in favor of eliminating shared
file pointers (11/6/3), but after some discussion, the vote
changed to undecided (9/9/1).
8. Filetype constructors (10/3/7 in favor of keeping)
Suggested that the pure datatype constructor routines
be moved into the Misc. chapter.
9. MPI_File vs. MPI_Comm (8/1/9 in favor of keeping MPI_File)
10. File consistency semantics: at MPI_CLOSE only
File updates are guaranteed to be viewed as consistent across
applications only after an MPI_CLOSE has been successfully
completed. No consistency semantics are provided for open
files, and it is erroneous for an application to depend on open
file consistency between two unrelated applications.
11. File pointer update at EOF corrected
The contradictory semantics in the existing proposal which
state that file pointers are always updated before operations,
but sometimes account for end-of-file has been clarified. The
file pointer is now updated by the amount of data actually
accessed for blocking operations (accounts for EOF), and the
amount of data requested for non-blocking operations (ignores
EOF).
Indecisions:
------------
1. How do we want to provide portable (> 32 bit) offsets?
The straw vote was split 7/10/3 between using the non-standard
FORTRAN "integer*8" and requiring ugly special access functions.
Other possibilities (e.g. using the mantissa of floating point
numbers) were voted down.
2. Should pack/unpack of datatypes be idempotent?
The external chapter allows opaque datatypes to be converted
into string representations. Do we need the pack/unpack to
be idempotent (i.e. strcmp(s, pack(unpack(s))) == 0)?
Upcoming Proposals for the July draft:
--------------------------------------
1. Move filetypes to READ/WRITE and add persistent operations [Nitzberg]
2. Add ICLOSE() [Frost]
3. Add IFILE_SYNC()
4. Simpler OPEN() interface [Nitzberg]
5. Explicitly state semantics better
6. Access to local and remote I/O services [Frost]
7. "XDR"-like format for data within files [Wright]
8. Import/export MPI I/O files to other systems [Landherr]
9. Version identification [Bruner]
10. Error handling [Frost]
11. FILE_CONTROL() replacement
12. Scheduling hints (e.g. DO_IO_NOW)
13. Merge I/O hints and Dynamic info interfaces [Saphir & Nitzberg]
14. Move filetype constructor section to Misc. Chapter
---------------------------------------------------------------------
Bill Nitzberg nitzberg@nas.nasa.gov
NAS Parallel Systems, MRJ, Inc.
NASA Ames Research Center, M/S 258-6 Tel: (415) 604-4513
Moffett Field, CA 94035-1000 FAX: (415) 966-8669