datatype accessor functions

Koichi Konishi (konishi@research.nj.nec.com)
Wed, 10 Apr 1996 20:07:32 EDT

Hello,

Here is an attempt to clean up the current situation around
datatype accessor functions. I am writing this being quite unaware of
who is doing what at this moment. Please let me know if other plans
are going, and all comments are, of course, welcome.

There are two related issues being discussed:
a) passing datatypes in a message.
b) accessing internal information of datatypes.

Both functionalities will be valuable to many MPI-2 users.
I emphasize b) is useful for at least three tasks:
- profiling/tracing/debugger systems
- representing data layout in memory as well as somewhere else,
like in a file.
- passing datatypes in a message (same as the 'a' above).

Now, UH/NEC dropped our proposal(MPI_TYPE_FIRST/NEXT/BASIC/EQUAL), in
favor of David Taylor's MPI_TYPE_ENVELOPE/CONTENTS. The reason for
choosing his functions is explained at the end of this mail.
These functions serve well for b), and also for a) to some extent.

I hope David will write a proposal in a more suitable form for
inclusion in the next MPI-2 draft, but if he can't, I will try(though
I will need a lot of help, even for English).

So, there are now only three outstanding proposals, grouped in two:
A1) MPI_GET_CHAR_DATA_TYPE/DATAYPE_FROM_CHAR
A2) a new datatype MPI_DATATYPE
B) MPI_TYPE_ENVELOPE/CONTENTS

I think having the next meeting with all of these is fine. Both A1 and
A2 are designed mainly for a), and work much better than B for that
goal. For b), A1 does not work so well, and A2 probably does
nothing. So having something special for a) in addition to B makes
sense. And A1 has not show up entirely.

And finally, a comparison of David's and UH/NEC proposals, which made
us convinced that David's are better. (This is NOT something for the
next MPI-2 draft.) The comparison also shows why having functionality
for accessing internal information (that is, b) is necessary for
portable MPI-IO implementation.

------------------------------------------------------------------------
From: Darren Sanders <sanders@hpc.uh.edu>
Date: Tue, 9 Apr 1996 06:13:27 -0500 (CDT)

[lines deleted]

Justification
^^^^^^^^^^^^^
A very powerful feature of MPI-1 is derived datatypes.
However, MPI-1 did not provide any functionality for allowing access
to the internals of these user defined datatypes. As a result,
efforts which are external to MPI, but which make use of MPI,
such as the MPI-IO effort, have no means of making use of MPI's
derived datatypes.

MPI-IO makes extensive use of MPI's derived datatypes by using
these datatypes to define the contents of a file, the file access
pattern and the user buffer. There are three type definitions
required by MPI-IO, in order to access a file. The first is
an elemetary type, called etype, which defines the basic data
unit found in the file. This can be a basic datatype, such as
MPI_CHAR, or a user defined datatype. The second type needed is the
filetype, which is a datatype made up of etypes. The filetype
specifies not only where the data is in the file, but also the
order in which the data will be accessed. The final type needed
is a buftype. Buftype is also made up of etypes and specifies
how the data buffer passed in by the user application shall be
accessed. For a more detailed discussion of etype, filetype and
buftype, please, see the MPI-IO document.

What we need for MPI-IO
^^^^^^^^^^^^^^^^^^^^^^^
In order to properly implement MPI-IO, we must know precisely
where the data can be found and in which order the data must be
accessed within a file and the user buffer. The one sure way of
knowing this is to rely on type maps of the etype, filetype and
buftype. A type map is defined as a list of types and their
corresponding displacements, {(type0, disp0),...,(typen,dispn)}.
The MPI-1 document uses type maps to illustrate the types that
would be created using the MPI_Type_{contiguous,vector,...} functions.
Once we have a type map, we can determine the size of of each individual
type using the MPI_Type_size function, so there's no need to maintain
this information in the type map.

The MPI-IO document clearly specifies that "holes" in the file
must remain untouched by a process when performing writes to the
file. A "hole" is defined as an area within the file for which
there is no defined data, according to the etype/filetype
specification. For example:

let etype = MPI_INT
let filetype be defined as:
MPI_Type_vector (4, 1, 4, etype, &filetype)

The type map of filetype is:
{(MPI_INT,0),(MPI_INT,16),(MPI_INT,32),(MPI_INT,48)}

In this example, the etype has no holes, i.e. its extent
equals its size, hovever, the filetype does have holes.
The areas of filetype which are undefined are from byte
offset 4 to 15, 20 to 31, etc. These regions of the filetype
are considered holes and must be made inaccessible by the
calling process.

By the preceeding example, it can easily be seen that the internals
of the datatypes, etype and filetype, must be known, in order to
correctly access the file.

The main question is how should the internal access to MPI derived
datatypes be defined? Initially, we decided to create an interface
that would simply provide what we needed for performing file access,
which is a type map of a derived datatype. See below for a more
detailed description of our proposed interface. We have since
implemented this interface into MPI and are successfully using it
in our MPI-IO implementation. The experience gained from implementing
and using this interface has shown us where the strengths and weak-
nesses of our interface lie.

The greatest strength of our interface is that it is simple to use
and allows the simple creation of type maps.

UH/NEC Proposal
^^^^^^^^^^^^^^^

MPI_Type_first begins an iteration through a derived datatype and returns
the first datatype and its displacement that is encountered.

int MPI_Type_first (MPI_Datatype datatype,
int search_mode,
MPIDE_Type_itor *handle,
MPI_Datatype *firsttype,
int *disp)

IN datatype datatype to traverse
OUT search_mode specifies traversal mode
OUT handle iteration handle to be used in subsequent next calls
OUT firsttype first datatype encountered during traversal
OUT disp displacement of first datatype encountered

MPI_Type_next returns the next datatype and displacement encountered in
the derived datatype.

int MPI_Type_next (MPIDE_Type_itor *handle,
MPI_Datatype *nexttype,
int *disp)

IN handle iteration handle returned in MPI_Type_first
OUT nexttype next datatype encountered during traversal
OUT disp displacement of next datatype encountered

Advantages
^^^^^^^^^^
1) The functions are simple to use.

2) The functions provide precisely what we need for performing
file access, no more, no less.

3) The information returned from our functions can be used to access
a file with little or no additional processing.

Disadvantages
^^^^^^^^^^^^^
1) We are aware of no other use for our extensions other than the
purpose for which they were designed, which was to create
type maps of derived datatypes.

2) As David Taylor pointed out in his merge proposal, when using
our extensions, there is no way to know in advance how many
iterations will be required. This problem is most noticeable
when the type map of the derived datatype will be stored in
a dynamically allocated array.

3) The UH/NEC proposal is inefficient for certain derived datatypes.
For example, a type created using MPI_Type_contiguous with 1000
elements would require 1000 iterations and an array with length
1000 in order to store the type map for that datatype.

4) The information returned using our functions cannot be used
efficiently to build an identical datatype.

Example of our functions
^^^^^^^^^^^^^^^^^^^^^^^^
The following code fragment came from our MPI-IO implementation. In this
section we are using the NEC/UH functions to create a type map of the
elementary datatype during file open.

/* fill etype typemap array with types and disps */
MPIDE_Type_first (etype, MPIDE_DEPTH_FIRST, &th,e_map.dtype,e_map.disp);
MPI_Type_size (e_map.dtype[0], e_map.size+0);
e_map.lowdisp = e_map.disp[0];
for (i = 1; i < e_map.len; i++)
{
MPIDE_Type_next (&th, e_map.dtype+i, e_map.disp+i);
MPI_Type_size (e_map.dtype[i], e_map.size+i);
/* get lowest displacement in type */
if (e_map.lowdisp > e_map.disp[i])
e_map.lowdisp = e_map.disp[i];
}

Observations
^^^^^^^^^^^^
Our functions help to create a map of a datatype at its basic level.
However, for some datatypes (ie for datatypes created using some
datatype creation functions) we don't need a type map which enumerates
every element of the type, because the needed information can easily
be calculated "on the fly".

The first (and best) example of such a function is the MPI_Type_contiguous
function. If this function is used to create a datatype that is
an array of 1000 integers, for example, then our functions would
be used to create a type map with 1000 entries. This is especially
wasteful because simply by analyzing the parameters passed to this
function we could easily calculate the displacement of any integer
in this derived datatype. Therefore, datatypes created with this
function should simply be represented by the combiner (as David Taylor
calls it) and the parameters used.

As it turns out, the previous observation also holds for the functions
MPI_Type_{vector,hvector,indexed,hindexed} in that they could also
be easily represented with a combiner and the corresponding parameters.
This is so because of the repetitive nature of the datatypes created
using these functions and these newly formed datatypes are created using
only one datatype (oldtype), ie only one datatype is passed to these
functions.

The most powerful of the datatype creation functions, MPI_Type_struct,
does not offer this possibility of optimization. In fact, our type
mapping functions do a relatively good job of mapping this function,
because unlike the other functions, a datatype created with this function
must not be repetitive and takes multiple previously defined datatypes
as its input.

David Taylor's proposal
^^^^^^^^^^^^^^^^^^^^^^^

David Taylor release a second proposal for datatype accessor functions.
His first proposal was designed to simply return the name of the
function and the corresponding parameters which had been used to create
the datatype.

He stated that one problem with that proposal was that because Fortran
could not allocate memory for arrays, it would be difficult to use
in that language because one could not know in advance how large
the arrays must be for his functions.

His new proposal solves that problem by letting the user specify
how much data will be returned in each call and the starting point
within the array. This essentially allows a user with access only
to fixed length arrays to retrieve all the data in multiple steps.
In my opinion, however, I don't see that this new proposal provides
and real advantages (or differences, for that matter) over his first
proposal for non-Fortran users. This is not to say that it isn't
important to fully support Fortran, but rather to point out that
the functionality offered by his proposal has not changed dramatically.

MPI_Type_envelope returns the count which was passed to the function
that created the datatype and returns the combiner which is the
function itself.

int MPI_Type_envelope (MPI_Datatype datatype, /* IN */
int *count, /* OUT */
int *combiner) /* OUT */

MPI_Type_contents returns the parameters which were passed to
the datatype creation function for creating datatype. start
specifies at what point in the array the function should begin
returning data and count specifies that maximum amount of data
that should be returned, i.e. the maximum length of the arrays.

int MPI_Type_contents (MPI_Datatype datatype, /* IN */
int start, /* IN */
int count, /* IN */
int *array_blocklens, /* OUT */
MPI_Aint *stride, /* OUT: XXX */
MPI_Aint *array_disps, /* OUT: XXX */
MPI_Datatype *array_types); /* OUT */

Advantages
^^^^^^^^^^
1) These functions return datatype information in a form which requires
much less memory and fewer iterations.

2) The information returned by these functions could be used to build
an identical datatype from scratch.

Disadvantages
^^^^^^^^^^^^^
1) The information returned by these functions may not be useful in
its original form and will require additional processing.

2) The additional code required for calculating displacements on
the fly could be very complex.

3) For complex datatypes the processing cost of calculating data
displacements could be very high.

------------------------------------------------------------------------

Koichi Konishi konishi@research.nj.nec.com
NEC Research Institute Office: 609-951-2628
FAX: 609-951-2488