How many parallel DBMS systems do you believe will be written ?
ten, a hundred, a thousand, ten thousand ?
Looking at the sequential DBMS market suggests tens rather than
hundreds or thousands.
How many parallel archival storage systems ?
Same question, similar answer.
This is *not* to say that these are not important things to do, or
that they won't be used everywhere, all I'm saying is that I don't
think that there will be that many different implementations of such
systems, and therefore not many people will see the problem we're
discussing.
> Heterogeneous Parallel Computation
I just don't believe that this is an issue at all for normal
heterogeneous computing. MPI already allows you easily to send
structures between hetero machines. As I tried to say before, most
applications don't handle data they don't understand the layout of.
I still think I'm missing something in this scenario of yours :-
> Scenario:
> 1. User application on N nodes of a parallel platform scans
> parallel DBMS (running on M nodes) for data of interest,
> stored in data formats the application understands.
Fine.
> 2. The DBMS looks for matches in it's metadata index and a list of
> entries matching the query are returned to the application.
Fine.
> 3. The application selects one from the list, but wants only
> certain fields.
Fine.
> 4. The application requests a cross section of the stored data
> from the DBMS.
Fine.
> 5. The DBMS has stored the CHAR_DATATYPE as metadata. It
> constructs the datatype and signals the application
> to initiate an asynchronous receive for the data.
What is this CHAR_DATATYPE, and who is using it ?
(The DBMS server, or the user ?)
It seems to me that neither of them need it.
The application certainly knows the format of the data it wants, so it
doesn't need to be told it. The application has to tell the DBMS
server about the format of the data (otherwise how can the server
extract the right pieces ?)
Therefore both ends of the transfer know the type signature of the
data. Each needs to add a layout. THe information in the layout can't
be stored anywhere. The DBMS needs to know where in store
it got the data from the disk, and how that relates to the slice the
user requested. (Given that the user asked only for a slice of the
data this information can only be calculated once we have the user
request).
The user app needs to know where in store it's going to put the
result (but that's obvious).
So, where's the problem ? The DBMS server has to do some grubby
things, but it alreay had to. It needs to know how the user data is
layed out in its store, but it had to know that aside from any MPI
issues.
> 6. The DBMS retreives the data cross section in parallel (MPI-IO)
> from the parallel archival storage system (running
> on L nodes).
> 7. The DBMS sends or otherwise broadcasts (MPI) the data in
> parallel to the application.
Your second example :-
> Another Scenario: Visualization of a running parallel computation.
> 1. The visualization program contacts the application, requesting
> a list of (MPI) datatypes and associated metadata.
> 2. The vis' program discovers elements of certain datatypes which it
> has operators to display.
> 3. Transactions similar to the above scenario begin ...
Here again I don't see a problem with what MPI already provides.
If the vis program is going to display the data it better know how it
is layed out in its store. If it knows it can tell MPI. If it doesn't
it can't do anything.
Having MPI receive it and lay it out "in an MPI natural way" doesn't
help much at all.
Please educate me. I still feel I'm missing something somewhere.
-- Jim
James Cownie
BBN UK Ltd
Phone : +44 117 9071438
E-Mail: jcownie@bbn.com