Canonical representation proposal

Albert Cheng (acheng@ncsa.uiuc.edu)
Fri, 4 Oct 1996 14:47:44 -0500

In the following two paragraphs, I am proposing the canonical file
representation of all MPI elementary data types and their defined
size. The first paragraph defines data representation of the
elementary datatypes. The second paragraph defines the byte sizes of
each datatype. I am proposing two sizes, the CANONICAL32 and
CANONICAL64, that are intended to be natural for 32 and 64 bits
machines. I like Leslie's rather comprehensive proposal of spelling
out all the byte sizes of every type but it is also good to define
couple commonly used sizes just for convenience.

Canonical Data Representation
-----------------------------

All floating point values are represented according to the IEEE
standards of the appropriate bit sizes.

COMPLEX is represented by two consecutive FLOAT's with the first
FLOAT as the real and the second FLOAT as the imaginary part.

DOUBLE_COMPLEX is represented by two consecutive DOUBLE's with the
first DOUBLE as the real and the second DOUBLE as the imaginary
part.

CHAR is assumed to be signed char (not sure about this.)

All signed numerals (INT, LONG, SHORT, CHAR, FLOAT, DOUBLE,
LONG_DOUBLE, INTEGER, REAL, DOUBLE_PRECISION) have the sign bit at
the most significant bit. Since complex numbers are represented by a
pair of floating point numbers, this implies that,

COMPLEX has the sign bit of the real and imaginary parts at the
most significant bit of the corresponding FLOAT's.

DOUBLE_COMPLEX has the sign bit of the real and imaginary parts
at the most significant bit of the corresponding DOUBLE's.

All numbers, integral or floating points, signed or unsigned, are
stored in Big Endian order (most significant byte stores first.)
E.g., an integer value of 0x01020304 will be stored as 4 bytes in the
order of 0x01, 0x02, 0x03, 0x04.

Fortran CHARACTER is represented by an INTEGER length followed by
number of bytes according to the value of length.

Alternative: Fortran CHARACTER is represented just by the number
of bytes, with the first character stored first. It is up to the
application to provide the correct length to access the CHARACTER
value.

LOGICAL is represented by a byte with the value zero representing FALSE
and any non-zero value representing TRUE.

BYTE is represented by a byte with values ranging from hexadecimal
00 to FF.

PACKED is represented by number of bytes according to the size of the
packed data, with the first byte stored first. It is up to
individual application to provide the correct size to access the
PACKED data.

Discussion: Dose it make sense to define the type PACKED at all?
Can it be assumed to be same as BYTE?

File Storage Sizes of MPI Elementary Datatypes (in bytes)
---------------------------------------------------------

C Types CANONICAL32 CANONICAL64

MPI_BYTE 1 1
MPI_CHAR 1 1
MPI_UNSIGNED_CHAR 1 1

MPI_SHORT 2 4
MPI_UNSIGNED_SHORT 2 4

MPI_INT 4 8
MPI_UNSIGNED 4 8

MPI_LONG 4 8
MPI_UNSIGNED_LONG 4 8

MPI_FLOAT 4 8
MPI_DOUBLE 8 8
MPI_LONG_DOUBLE 16(??) 16(??)

MPI_PACKED <data size> <data size>

Fortran Types CANONICAL32 CANONICAL64

MPI_BYTE 1 1
MPI_LOGICAL 1 1

MPI_INTEGER 4 8
MPI_REAL 4 8
MPI_DOUBLE_PRECISION 8 16

MPI_COMPLEX 8 16
MPI_DOUBLE_COMPLEX 16 32

MPI_CHARACTER 4+<length> 8+<length>
MPI_PACKED <data size> <data size>

Advice to implementors.
When converting a larger size integer to a smaller size integer,
only the less significant bytes are moved. Care must be taken to
preserve the sign bit value. This allows no conversion errors if
the data range is within the range of the smaller size integer.

Advice to users.
It is the user's responsibility to select the correct Canonical
representation that is big enough for the data ranges of the
application.

============================================
Ref:

"IEEE Standard for Binary Floating-Point Arithmetic", ANSI/IEEE
Standard 754-1985, Institute of Electrical and Electronics Engineers,
August 1985.

--