3rd party datatype instantiation (was: mpi-io, mpi-dynamic generalizations)

Richard Frost (frost@SDSC.EDU)
Fri, 3 Nov 1995 10:58:49 +0001 (PST)

Hi folks,

I'm very interested in what could be called "3rd party datatype transfers".
For example, a parallel program writes it's data to some storage facility.
A year later, a new application at another site gets the data (e.g., file)
and tries to read the data. This new application has never seen the
source code of the old. It needs a mechanism for instantiating the
datatypes contained in the old data set.

O.K., so there are a lot of systems out there which attempt this statically;
i.e., they require that the new application be compiled (or linked) with
the C++ class libraries of the old program, an object-broker, etc.

To from our point of view this is not acceptable, we are attempting to
generalize database technology -- inside out: we propose self-describing
data as opposed to an external data description language (e.g., SQL).
In addition, we want the tools of collective communication available to us
via MPI. Consequently, we view MPI Datatypes as a means to accomplish
this goal.

Further, we believe that the mechanisms proposed in the MPI2
report are very close to the functionality required to acheive the
above goal.

In what follows I identify specific problems with what is presented
in MPI-2 report 10/17/95 and request some clarifications.

> 3.4: Decoding datatypes into character representation. There will
> also be functions to go the other way.

MPI_GET_CHAR_DATA_TYPE(dataType, charDataType) ...

The following restriction is mentioned:
The string can, of course, be sent to another process but may not
be completely portable when offsets or absolute addresses are
used (as in STRUCT, HVECTOR, HINDEXED, MPI_UB, MPI_LB).
I do not understand the inclusion of STRUCT in this restriction. By
definition, STRUCT requires the specification of types. Perhaps the
intended restriction refers to the downsizing of specific types; e.g.,
from 64b to 32b floats? In that case, the above statement applies
to all types.

> 3.7 and 3.8: allows user to search/decode the "tree" of a derived
> datatype.

A few of these functions appear generally useful for space allocation,
and the remainder specifically useful to debuggers and profilers.

Consistency in "# of top-level entries" vs. "# of bottom-level entries"
is highly desirable for portability. Can this be acheived?

> 4.9: allows user to publish a datatype so that a put/get can occur
> faster.

This "placeholder" is restricted to running processes. A suitable
implimentation of 3.4 might allow a 3rd party to "store" a data type
(as metadata) with data, then re-parse it and broadcast to the
new client for instantiation. The remark about "we could make
these datatypes global" completely ignores the scenario of
heterogeneous client-server transactions via inter-communicators.

-----

The MPI-IO group is proposing (building) a "hints" data structure
associated with I/O transactions. A similar structure would be
useful in filling-out the data description of an MPI Datatype.
For example,

MPI_DATATYPE_ATTRIBUTE( datatype, attribute )

IN (and OUT) datatype MPI Data Type
OUT attribute character string

Such a routine would append the string to a (linked) list
of attribute strings associated with the datatype. The
purpose is to embed content specifications such as image
format headers, mime types, GIS keys, etc. These attributes
would not be discarded in MPI_GET_CHAR_TYPE transactions.

Is this interesting to anyone else? :-)

Richard Frost
SDSC