> Jim Cownie writes:
> The thing which *is* hard is to handle receiving and understanding
> data whose data types you didn't know at compile time. But this is
> clearly hard anyway, aside from MPI. (How do you know what to do with
> the data if you don't even know what format it is in ?).
Receiving it is easy. Understanding it is hard, UNLESS you are
interested in 3rd party transfers -- especially those involving
parallel broadcast and reduction. The principal parties will
understand the data, and make requests based on that understanding.
> Such applications seem to me to be few, and (I certainly hope) will be
> written by competent people, to whom the MPI issues will be but one
> very small part of the problem.
I disagree ... such applications are many and growing at a tremendous
rate. They are:
Parallel DBMS
Parallel Archival Storage
Heterogeneous Parallel Computation
The 1st two technologies are essential to managing the exponential growth
of *application* generated data.
Scenario:
1. User application on N nodes of a parallel platform scans
parallel DBMS (running on M nodes) for data of interest,
stored in data formats the application understands.
2. The DBMS looks for matches in it's metadata index and a list of
entries matching the query are returned to the application.
3. The application selects one from the list, but wants only
certain fields.
4. The application requests a cross section of the stored data
from the DBMS.
5. The DBMS has stored the CHAR_DATATYPE as metadata. It
constructs the datatype and signals the application
to initiate an asynchronous receive for the data.
6. The DBMS retreives the data cross section in parallel (MPI-IO)
from the parallel archival storage system (running
on L nodes).
7. The DBMS sends or otherwise broadcasts (MPI) the data in
parallel to the application.
Giving the DBMS and Archival Storage system access to structural
internals of the data will enable optimized storage and retrieval --
especially when a single composite type is larger (longer) than
several disks or tapes.
This is why packed objects are unsuitable. The DBMS term for packed
is "large" or "opaque" objects.
Another Scenario: Visualization of a running parallel computation.
1. The visualization program contacts the application, requesting
a list of (MPI) datatypes and associated metadata.
2. The vis' program discovers elements of certain datatypes which it
has operators to display.
3. Transactions similar to the above scenario begin ...
Richard Frost
SDSC