Re: canonical data representation

Albert Cheng (acheng@ncsa.uiuc.edu)
Sat, 01 Mar 1997 14:59:32 -0600

At 09:53 AM 3/1/97 +0000, Nick Maclaren wrote:
>Yes and no. While it is like that, the program need know only his own
>system. My first comment is that we don't need any more hooks - the
>current specification is quite adequate. In particular, we DON'T want
>system-dependent hooks, both for practicability and to avoid vendors
>opposing something because they have been left out.

I think the current proposed hook, at least of MPI created file, is
to specify the Access Mode of Canonical (aka External) representation.
This mode is supposedly system-independent, i.e. all MPI implementations
should produce the same bit-pattern file (well, the floating and logical
data may vary a little since more than one patterns may represent the
same value.)

>Here is a description that would work, and would be tolerably efficient
>on almost all modern systems. It would, however, be quite slow to move
>between two systems with different models.
>
>A canonical file includes a table that translates MPI logical types into
>a restricted range of standard types (2's complement integers, IEEE FP,
>and ISO 10646, Unixcode and/or ISO 4-byte characters). It would also
>define the byte order. It and the data are sent as a stream of 8-bit
>bytes, in the usual way.

Yes, that will be a general approach and would cover most cases we know
now. But this has two problems if installed for MPI files.

1) Where do you store that table when the MPI file is "exported" to
the serial world? It must go to the beginning of the file. That
violates the current MPI-view definition--the displacement is supposedly
offset from the physical beginning of the file. What you like to have
is a self-described file but current MPIO definition tries to defer to
it to some upper layer applications/libraries like netCDF, HDF, ...

2) This approach also prevents forward compatibility. Say, I purchase
a binary copy of a visualization tool that knows all the 32 and 64
bits machine types. There comes this new machine that uses 12 bytes
floating representation with some crazy new ideas (say compressed bits).
My "old" visualization tool fails on this data file and I am stuck since
I can't even attempt to hack the source. But if we have just one or
two universal External representations--all MPI implementations just
need to figure out how to convert between its own machine types and the
external representation. Then all MPI-External-represenation files are
interoperative between all MPI implementations with the potential loss
of some precision.

>A minor, but important detail, would be to tie down exactly which IEEE
>values were defined. The simplest approach would be to say that it is
>not defined whether NaN values carry information (i.e. whether they can
>all be treated the same).
>
>This is well-defined, in that it can be interpreted on any system (including
>IBM 370, VAX, Cray YMP and native Intel), as well as all of the 'dead'
>general-purpose architectures that I know about. I am 99% certain that it
>will remain interpretable for a century or more (even if expensively).
>
>It is tricky, tedious and slightly slow to convert to any other format
>(including different endianness), but does not require immense skill for
>any system with any form of the basic data types. It DOES require more
>skill for the IBM 370, VAX, Cray YMP and native Intel), especially when
>things like infinities are considered.

I agree and I hope all these tedious conversions can be implemented as part
of the MPIO library rather than asking the MPI-applications to do their own.