canonical external representation of Fortran 90 datatypes

W. Saphir (wcs@nersc.gov)
Tue, 18 Mar 1997 14:51:23 -0800 (PST)

Dear I/O and F90 groups:

At the last meeting, we spent a lot of time discussing canonical
external representations for C and Fortran datatypes. It was pointed
out that there was no way to handle Fortran 90 parameterized types,
because they are not named MPI types. This would make canonical I/O in
Fortran 90 impossible. Since the Fortran 90 KIND mechanism is
specifically designed to allow programs to get guarantees on numerical
precision, it seemed that we ought to be able to fix the problem.

[For F90 neophytes, it works like this:
real(selected_real_kind(p, m)) x
declares x to be a real number with at least p digits
of precision and an exponent of 10^m.]

There was a quick proposal which we discussed and voted on Friday
morning. In the discussion is became clear that the proposal would not
fix the problem. We voted it in anyway in the hope that we could come
up with a minor fix for next time.

This note contains a modified proposal that provides a solution, and
we can discuss whether it provides the right solution. It happens
that it also addresses problems with canonical representations in C
and Fortran 77.

One way to fix the problem would be to more tightly tie together the
declaration of a variable with the external rep of the associated MPI
type, e.g.

MPI_TYPE_CREATE_F90_INT(IN d, OUT newtype)
MPI_TYPE_CREATE_F90_REAL(IN p, IN e, OUT newtype)

to correspond to
integer(selected_real_kind(d)) i
real(selected_real_kind(p, e)) x

In this case, the canonical rep would be directly tied to the
declaration, and the immediate problem would be solved. I decided not
to go with this approach primarily because people don't often declare
variables in this way, so it is of limited usefulness.

Proposal.

1. The canonical external rep of a datatype returned by
MPI_TYPE_CREATE_F90 is the same as that of the basic type used. The
canonical external representation of MPI_REAL is 8 bytes. [Others are
as defined in leslie's proposal. I picked 8 bytes because we no longer
have REAL and DOUBLE_PRECISION to provide two choices, because many
scientific codes need 8 bytes, and because it is better to err on the
large size].

2. Add a new function

MPI_TYPE_CREATE_EXTERNAL(IN basictype, IN size, OUT newtype)

basictype is one the basic integer or floating point types
(MPI_INTEGER, MPI_REAL, or a type returned by MPI_TYPE_CREATE_F90 from
one of these basetypes). This can easily be extended to C to include
MPI_INT, MPI_FLOAT, MPI_DOUBLE, etc. ).

Size is either 4 or 8, corresponding to one of the prefedined
representations.

newtype is a a new datatype which is exactly the same
as the old, except that its external representation is the size
specified by the user

Note that logical and character types are not allowed (i.e., their
external representations are fixed. (Should we relax this?)

Discussion: "exactly the same" should be clarified. A "natural"
definition would allow mixing basictype and newtype
in send/recv operations. They would be different only
for read/write operations with canonical file format.

Examples:

This declares "x" with at least 6 digits of precision. On systems
that support 4-byte reals, this will be four bytes (call these systems
"type A"). On systems that don't support 4-byte reals (type "B"), this
will probably be 8 bytes.

real(selected_real_kind(6, 0)) x

This declares "y" to have at least 12 digits of precision. On all
systems I know of, this will result in an 8-byte representation
(though a system supporting 16-byte reals and not 8 byte reals could
make it 16 bytes).

real(selected_real_kind(12,0)) y

This declares "z" to be a default real number. On some systems it
will be 4 bytes, and on some, it will be 8 bytes. Fortran does not say
what it should be, or require a minimum precision.

real z

This creates datatypes suitible for x, y, and z, whatever size they
happen to be.

MPI_TYPE_CREATE_F90(MPI_REAL, kind(x), XTYPE, ierr)
MPI_TYPE_CREATE_F90(MPI_REAL, kind(y), YTYPE, ierr)
MPI_TYPE_CREATE_F90(MPI_REAL, kind(z), ZTYPE, ierr)

(Note that for z, MPI_REAL would also work, though because of the
possibility of compiler flags changing the default size, I would
recommend to use the function instead).

The canonical external representation of these types is 8 bytes
according to the rules above because the base type is MPI_REAL and the
canonical external rep of MPI_REAL is 8 bytes.

Now suppose the user only needs 4 bytes and wants to save disk
space. No problem - just create a new datatype with a different
external representation:

MPI_TYPE_CREATE_EXTERNAL(XTYPE, 4, NEWXTYPE)

and use "NEWXTYPE" to write the data, in canonical format.


Discussion:

This function is similar to dup(). (Can it be implemented with
reference counts? If not, is this a problem?).

I like this model because it provides reasonable defaults but gives
the user simple control over the external rep. I prefer it to the
callback model because it is much simpler. It does not have all of the
flexibility of that model (because only the prefedined 4 and 8-byte
formats are allowed) but I think it is completely sufficient to write
out a file on one machine and read it in on a different machine.

In my view, it also solves the problems with canonical external reps
in C and Fortran 77, and I'd like to extend it there.