Re: canonical data representation

Albert Cheng (acheng@ncsa.uiuc.edu)
Fri, 28 Feb 1997 17:48:35 -0600

I also agree at least getting the hooks in. But
I am not sure if I follow the later part. Currently,
MPI types do not say anything about the sizes of the
data. So, an application that writes 1 MPI_INT, 1 MPI_LONG
may end up with a file of
32 bits integer, 32 bit integer (32 bit workstation)
32 bits integer, 64 bit integer (64 bit SGI)
64 bits integer, 64 bit integer (64 bit Cray)

Another application, e.g., a visualization tool, knows that it
needs to read 1 MPI_INT, 1 MPI_LONG and if it has to accept
MPI files created in various machine types, it is impossible
to program it using mere MPIO functions without interactive
input asking "what kind of file you give me this time?" The
read data segment looks rather complicated--it has to
figure out how many bytes to bring in, sign-extension or not,
little or big endian, swab-byte or not. The reader application
basically has to read MPI-data in by bytes, then start doing
byte movement here and there. That would require an experience
programmer who knows all the specific of each machine out there.

If we can have certain external data representation that defines
1 MPI_INT is stored as a 32-bit-big-endian integer, 1 MPI_LONG is
stored as a 64-bit-big-endian integer, etc. Then the MPIO implementation
can do all the above bytes movements and conversion for the users.
The trade-off is potential loss of precision if the machine involved
is not a good match with the defined data representation.

If the precision loss is unacceptable to a certain application, it can
choose to use the NATIVE-mode and never move the data files to a different
platform or it can write data out in MPI_BYTE and program their own
conversion. The defined external data representation just makes
life easier for the subset of MPIO applications that can tolerate the
potential precision loss.

At 09:30 AM 2/27/97 -0400, Jean-Pierre Prost wrote:
>
>I fully agree with Marc on this issue. We should at least
>achieve 2.
>And requiring the user to remember if the numbers present in the
>file are 64 bit or 32 bit besides the type itself is not unreasonable.
>Jean-Pierre
>
>From: "Marc Snir"<snir@watson.ibm.com>
>...
>But this might well be what applications need: I doubt very much that users
>will want 64 bit real numbers to be transparently truncated to 32 bits.
>We were looking for a definition of "canonical" were the user will only
>have to know that a file contains 5 floats, followed by 4 ints, followed by
>20 chars. I suggest that we require the user to know that the file
>contains 5 64 bit floating point numbers, followed by 4 32 bit integers,
>followed by 20 byte characters, know what is the local size of its
>datatypes, and declare variables accordingly.