canonical data representation

Marc Snir (snir@watson.ibm.com)
Wed, 26 Feb 1997 22:23:29 -0400

We can postpone dealing with canonical data representation, but we cannot
avoid altogether this issue, especially if we want to leave the door open
for interoperability. We have three choices

1, Ignore the issue
2. Put hooks in MPI that can be used to generate one or more specific data
encodings but postpone the specification for that encoding.
3. Specify a canonical encoding.

If we do not feel we can do 3, we should at least specify the MPI syntax
for 2, so that we can revisit the issue. I.e., if we cannot agree on a
canonical data encoding, we must, to the least, convince ourselves that
this can be later retrofited into MPI without requiring new functions.

By the way, there is one language that requires a canonical data
representation for its basic datatypes -- this is Java. So, if we were
doing an MPI binding for Java, the issue of canonical data representation
would not arise. Since Java implementations interoperate with C and C
interoperates with Fortran this, in principe, defines a "canonical" data
representation. It is canonical in the sense that each processor has a well
defined mechanism for mapping C or Fortran types to a sequence of bytes.
It is not canonical in the sense that there is a fixed encoding for 32 bit
integers, but not a fixed encoding for C values of type int. There is a
single representation of a 32 bit integer or of a 64 bit integer, but
different representations for int according as it is 32 bit or 64 bt long.
But this might well be what applications need: I doubt very much that users
will want 64 bit real numbers to be transparently truncated to 32 bits.
We were looking for a definition of "canonical" were the user will only
have to know that a file contains 5 floats, followed by 4 ints, followed by
20 chars. I suggest that we require the user to know that the file
contains 5 64 bit floating point numbers, followed by 4 32 bit integers,
followed by 20 byte characters, know what is the local size of its
datatypes, and declare variables accordingly.