Summary: File Interoperability

Bill Nitzberg (nitzberg@nas.nasa.gov)
Fri, 05 Jul 1996 15:59:49 -0700

The recent discussions on file interoperability have been muddied
by a mixing of: desired capabilities, potential bindings, and
possible implementations. This is an attempt at separating these
issues.

Once again, this is a distillation of many people's work, including
Anurag Acharya, Sam Fineberg, Richard Frost, Leslie Hart, Terry
Jones, Steve Landherr, Bill Nitzberg, Elsie Pierce, Rajeev Thakur,
Parkson Wong, Dave Wright, and others.

Desired Capabilities
--------------------

At the most basic level, file interoperability is the ability to
read the information previously written to a file (not just the
bits, but the actual information the bits represent).

There are (at least) four semi-orthogonal file interoperability
capabilities. While considering any particular capability, assume
all of the other capabilities are held constant (e.g. for Applications,
we assume a single hardware platform, a single filesystem interface
(MPI), and a single MPI implementation). The capabilities are
defined by the ability to access information stored in a file
between:

(1) Applications

Information is accessible from one run to the next of the
same application, between two applications, or among
applications running on different numbers of nodes.

(2) Hardware platforms

Information written to a file on one hardware platform is
accessible by an application running on another hardware
platform (which could have a different native data format).

(3) Filesystem interfaces

Information written using one filesystem interface (e.g.
MPI) is accessible by an application using a different
filesystem interface (e.g. UNIX).

(4) MPI implementations

Information written using one MPI implementation (e.g. from
vendor XYZ) is accessible by an application using a different
MPI implementation (e.g. from vendor ABC). As a special
case, (4) includes different versions of an MPI implementation
from the same vendor.

Potential Bindings
------------------

"File interoperability" is characterized not just by the capabilities
provided, but also by the amount of effort required by the user to
achieve the desired capabilities. This level of effort or amount
of transparency provided by the interface is simply a binding issue.

Transparency is a continuum from requiring the user to do all the
work through hiding everything inside the implementation (no user
directives at all).

The proposals put forth to date choose a middle ground, with the
user specifying the "level" of file interoperability desired at
OPEN time.

Possible Implementations
------------------------

Note that these "implementations" are not descriptions of algorithms
to achieve the above capabilities, but are requirements set forth
in the MPI standard which constrain all MPI implementations, in a
way which is visible outside of the MPI context.

We probably want to require (1), which is the most basic form of
file interoperability, and is usually what is meant by "operability".

(A) Require off-line conversion utilities

An "off-line conversion utility" must be provided to allow
files to be accessed from different hardware platforms
(e.g. performing data format conversion and file layout
conversion), from different interfaces (e.g. to and from
UNIX), and from different MPI implementations (e.g. between
ABC and XYZ vendors).

+ Provides capabilities (1), (2), (3), and (4).
- Easy for the user to implement (read MPI, write UNIX from one node)
- May require twice the disk space, to allow for conversion
- Probably still need to do (C) below

(B) Require POSIX compatible file structure

Provide a directive to require that an MPI file be accessible
via the native UNIX interface, and that the canonical MPI
ordering is the canonical byte stream ordering under UNIX.

+ Provides capabilities: (3) directly, (1) & (4) indirectly.
- May slow down file access if this directive is given.

(C) Require XDR-like data format

Define an XDR-like data format and provide a directive which
requires that all data in an MPI file uses the defined format.

+ Provides capabilities: (2)
- May slow down file access if this directive is given.

(D) Require self-describing files and/or metadata

Define an HDF-like format for describing data formats and
file layout (e.g. striping) and provide a directive which
requires that an MPI file be written/read using this defined
format.

+ Provides capabilities: (1), (2), (3), and (4).
- May slow down file access if this directive is given.
- Defining an HDF-like format might be time-consuming
- Defining portable file layout metadata might be impossible

Hope this helps,

- bill