MPI_FOPEN(comm, filename, amode, info, fh)
IN comm MPI communicator
IN filename Name of file to be opened (string)
IN amode File access mode (integer)
IN info MPI_INFO structure of {key, value} pairs
OUT fh File handle
* Nodes participating in a parallel I/O transaction are
specified by an MPI communicator.
** In contrast to the existing proposal (10.7), the communicator
is NOT cached in the file handle. Instead, it is required
in collective I/O calls as a demarcation.
** In contrast to the existing proposal (10.2), append mode is
permitted in amode.
* The basic units in a file or I/O stream are MPI_Datatypes
* The default unit for a (file) stream is MPI_Byte
** In contrast to the existing proposal (10.7), implementation
directives may NOT be given in the filename, but rather
are specified in the "info" argument.
MPI_FLAYOUT(fh, etype, filetype, count, info)
IN fh File handle
IN etype Elementary datatype
IN filetype Filetype to use from current fh position
IN count number of "Filetype" units from which new
layout is valid
IN info MPI_INFO structure for access pattern, etc.
** In contrast to the existing proposal (10.7), MPI_FLAYOUT
is lightweight and incremental. It is a heads-up to the
implementation that "any read/write/seek operations I
perform from the beginning of this layout through the given
count will be of type {etype, filetype}".
** a count < 0 means backwards from the current layout.
count > 0 means from the current layout.
count == 0 means infinite; i.e., this file is homogeneous
** initially (on open), a file has no layout but etype and
filetype == MPI_BYTE. The user's view is by default
|BBBBBBBBBBBBBBBBBBBBBBBBBB ... | (B == MPI_BYTE)
^ ^
fh EOF
Suppose that MPI_FOPEN is immediately followed by
MPI_FLAYOUT(fh, MPI_INTEGER, MPI_INTEGER, 2, whatever)
on some node. The view on that node is now:
|IIBBBBBBBBBBBBBBBB ... | (I == MPI_INTEGER)
^ ^
fh EOF
If two different nodes declare different layouts for the
same file segment (e.g., the first 8 bytes) then future
results are undefined, with no error if the user is in
reckless mode.
Now suppose the MPI_FLAYOUT above is immediately followed by
MPI_FLAYOUT(fh, MPI_REAL, MPI_REAL, 3, whatever)
on the same node. The view on that node is now:
|IIRRRBBBB ... | (R == MPI_REAL)
^ ^
fh EOF
The file handle now "points" at the beginning of the new
layout. Semantics: no memory of the past layout is expected.
Now suppose the user reads 2 reals with no offset:
MPI_FREAD(fh, mybuf, MPI_REAL, 2, 0, status)
The resulting "position" of the file handle is
|IIRRRBBBB ... | (R == MPI_REAL)
^ ^
fh EOF
Note that MPI_FLAYOUT provides support for both initial
and incremental declarations of file datatype layouts.
As such it supports both the "a priori" and "a posteriori"
(Run Length Encoding) file formatting models.
Advice to users: don't do fine-grained RLE
in parallel. Instead, read on one node and then farm it.
MPI_FREAD(fh, buff, buftype, bufcount, offset, comm, status)
MPI_FWRITE(...)
MPI_IFREAD(..., request)
MPI_IFWRITE(..., request)
IN fh File handle
IN buff buffer (OUT on write)
IN bufcount number of buftypes to read/write
IN offset offset in filetypes
IN comm the communicator (possibly NULL)
OUT status success/warning/failure
OUT request request handle (for asynchronous versions)
* Support for coordinated parallel read/write stream access
is provided with a valid communicator (implementors, use
MPI_COMM_COMPARE).
** A group of nodes "split" from those performing the open
can execute a coordinated I/O operation.
* Support of independent parallel read/write access is provided
when comm == MPI_COMM_NULL.
** Note that the implementation must keep a "position" pointer
per node for both the user and the library. Consider a
sequence of IFREAD, FLAYOUT, IFREAD, FLAYOUT, etc., followed
by a wait on all outstanding requests. The user's view of
the file handle position is that it moves with each read.
However, the file handle in the backend of the implementation
will not "catch up" in position until the wait is satisfied.
The implementation need not store all of the interim
positions. Each set of new displacements can be computed
from the previous; i.e., FLAYOUT has incremental semantics.
* A Standard SEEK positioning model is not supported by this
API, instead positioning is acquired through MPI_FLAYOUT and
the offset argument in MPI_FREAD, et al. :-(
* An Atomic SEEK_AND_READ/WRITE for high-performance could be
constructed as a union of the MPI_FLAYOUT and MPI_FREAD specs.
MPI_FSEEKREAD(fh, etype, filetype, count,
buff, buftype, bufcount, offset, comm, status)
MPI_FSEEKWRITE(...)
MPI_IFSEEKREAD(..., request)
MPI_IFSEEKWRITE(..., request)
MPI_FCLOSE(fh, comm, info)
MPI_IFCLOSE(fh, comm, info, request)
IN fh File handle
IN comm MPI communicator
IN info MPI_INFO structure of {key, value} pairs
OUT request request handle (for asynchronous versions)
* These calls should have Unix-like semantics for close, and
MPI semantics for collective, (a)synchronous operations.
** It can be performed independently (with MPI_COMM_NULL),
by a split communicator, or the original communicator set.
Tolerance semantics: if the file handle is first closed
by a split communicator, then again by the original (opening)
set, then the user is granted forgiveness and no error is
generated. This is permissive towards master-slave
programming models.
* The info argument can be used to supply hints.
MPI_FCONNECT(comm, amode, info, fh)
IN comm MPI communicator
IN amode File access mode (integer)
IN info MPI_INFO structure of {key, value} pairs
OUT fh File handle
* Optional functionality for remote or special I/O services.
All resource specifications are given in the info argument.
* The returned file handle allows the user to interact with
the opened resource through an MPI I/O file handle.
** Whether anything useful is ever returned is a quality of
implementation issue.
* Suggested functionalities:
- "standard" archive services
- ftp services
- named blobs in ODBC compliant databases
MPI_FCONTROL(fh, comm, flag, choice, info)
* Stream and service control hints and directives can be
implemented by either one function with N flags, or
N functions with no flags.
Please send any requests for control functionalities not
listed in MPI2 chapter 10 to mpi-io@mcs.anl.gov with a
suitable subject line. :-)