When we are discussing CANONICAL data representation(s) for MPI I/O perhaps the
following questions are useful to look at:
1) Is there a specific bit pattern I can expect in a file when I read/write?
2) Can I write a file using machine A and use the result with the XYZ graphics
package on machine B?
3) Can I interoperate with other MPI I/O packages?
4) Is there an existing standard that is adequate?
5) Do we want to leave open the possibililty of larger word machines?
6) Do we want to allow canonical in MPI_PACK/UNPACK?
7) Do we want to introduce it for MPI interoperability?
Question 1:
I think it is reasonable for a the user to have some minimal level of control
regarding the input and output format of his/her files.
Question 2:
We can probably craft something to solve this problem. My original proposal
of October (I think) went quite a way in that direction. It requires the user
to be able to define the size of each language type independently. We could
do this with a number of pre-defined attributes on a special communicator,
but it gets messy pretty quickly.
Question 3:
Not so different from question 1.
Question 4:
There is XDR and ASN.1:
For www details:
XDR: ftp://ds.internic.net/rfc/rfc1832.txt
ASN.1: http://www.inria.fr/rodeo/personnel/hoschka/asn1.html
XDR vs ASN.1: http://ganges.cs.tcd.ie/4ba2/presentation/xdrandber.html
What are the short comings?
They don't map language types to their defined data types. We still have
to make a mapping from language types to bit representations.
XDR has some packing issues with types shorter than 32 bits, Padding is
added to the end of shorter types to round up to 32 bit boundaries (three
1-byte characters are packed into a 4-byte area).
ASN.1 includes some meta-data that may make it more difficult for the
average program to interpret.
Do we want to use XDR or ASN.1? I don't think either are the most suitable
for an external representation in MPI I/O.
Question 5:
This probably means an extensible solution, which gets as messy as something
that would solve question 2.
Question 6:
This is the current proposal.
Question 7:
This is an issue for the separate group considering interoperability.
So, what are we left with?
We can define a single CANONICAL representation (Albert and I proposed two at
the last meeting, a 32 bit based representation and a 64 bit based
representation.) If we do this I would favor one that loses the least amount
of information, something like the CANONICAL64 we proposed in January.
We can define several EXTERNAL representations (this is essentially what
Albert and I did for the last meeting).
We can define method for mapping language data types to a small number of
primitive data types. This involves a pre-defined attribute per language on
a "special" communicator. Each decimal (hex, octal?) of the value would be
the log(2) of the number of bytes of the primitive types that the user can
control with limits on the valid choices. This will get messy for
implementors pretty quick and will most likely not be performance oriented
:-)
Comments, suggestions?
Regards,
Leslie Hart (hart@fsl.noaa.gov)
NOAA/FSL