I'm working on a project for ARPA and IBM involving the integration
of applications, web-browsers, databases (DB2 & Illustra), and
archival storage (Unitree & HPSS).
MPI will play a key role and I hope to leverage from MPI-IO as well.
In particular, we are driven by the use of MPI derived data types.
The following 2 components are necessary to complete a prototype system
using MPI. I'm interested in learning about similar efforts before
defining an API.
1. An on-the-wire protocol for transmiting derived data types,
and a mechanism for run-time parsing of these types on the
receiving end.
For example, consider a parallel DBMS responding to a
request from a parallel computation. We would like the DBMS
to store the derived data type as metadata for reuse at a
later run time instead of linking the DBMS to an intractable
number of hardcoded data types. (Yes, this is an MPI/SQL
interface.)
In particular, DB2 would need an MPI-IO-like module (MDIO)
for interfacing with computational applications and
browsers; and another module for interfacing with archival
storage. SQL transactions would only occur between the
DBMS modules and the DBMS.
2. A generalization of files; i.e., we need to overload
MPI-IO's concept of files so that we can perform I/O
transfers between agents (databases, web-browsers, archival
storage systems) along with actual file systems.
For example, a file might sometimes be designated by a Unix
file descriptor, and at other times an external communicator
containing an arbitrary number of processes. We need to
facilitate M-to-N I/O transactions between arbitrary
information sources and consumers in a way that gives
applications a simplified view of open, read, write, close.
See the "Example system architecture" on
http://www.sdsc.edu/EnablingTech/MassDataAnal/MassDataAnal.html for a
diagram.
Thanks,
Richard Frost
SDSC