Re: 3rd party datatype instantiation

Richard Frost (frost@SDSC.EDU)
Fri, 10 Nov 1995 09:49:03 +0001 (PST)

Here I present arguments why an MPI_CREATE_CHAR_DATATYPE function
would be of great utility. Why leave it to multiple users to reinvent
when MPI can define it once? In particular, why force others to
study the structure of the flat character representation?

> Jim Cownie writes:
> The thing which *is* hard is to handle receiving and understanding
> data whose data types you didn't know at compile time. But this is
> clearly hard anyway, aside from MPI. (How do you know what to do with
> the data if you don't even know what format it is in ?).

Receiving it is easy. Understanding it is hard, UNLESS you are
interested in 3rd party transfers -- especially those involving
parallel broadcast and reduction. The principal parties will
understand the data, and make requests based on that understanding.

> Such applications seem to me to be few, and (I certainly hope) will be
> written by competent people, to whom the MPI issues will be but one
> very small part of the problem.

I disagree ... such applications are many and growing at a tremendous
rate. They are:
Parallel DBMS
Parallel Archival Storage
Heterogeneous Parallel Computation
The 1st two technologies are essential to managing the exponential growth
of *application* generated data.

Scenario:
1. User application on N nodes of a parallel platform scans
parallel DBMS (running on M nodes) for data of interest,
stored in data formats the application understands.
2. The DBMS looks for matches in it's metadata index and a list of
entries matching the query are returned to the application.
3. The application selects one from the list, but wants only
certain fields.
4. The application requests a cross section of the stored data
from the DBMS.
5. The DBMS has stored the CHAR_DATATYPE as metadata. It
constructs the datatype and signals the application
to initiate an asynchronous receive for the data.
6. The DBMS retreives the data cross section in parallel (MPI-IO)
from the parallel archival storage system (running
on L nodes).
7. The DBMS sends or otherwise broadcasts (MPI) the data in
parallel to the application.

Giving the DBMS and Archival Storage system access to structural
internals of the data will enable optimized storage and retrieval --
especially when a single composite type is larger (longer) than
several disks or tapes.

This is why packed objects are unsuitable. The DBMS term for packed
is "large" or "opaque" objects.

Another Scenario: Visualization of a running parallel computation.
1. The visualization program contacts the application, requesting
a list of (MPI) datatypes and associated metadata.
2. The vis' program discovers elements of certain datatypes which it
has operators to display.
3. Transactions similar to the above scenario begin ...

Richard Frost
SDSC