Leslie Hart wrote:
>
> This proposal is prompted by Tony Skjellum's proposal titled: "Infinite
> asymptotic bandwidth for thread communication"
>
Tony's original proposal mentioned an RK-like package that we (well, I)
am working on at NAS called MPK. It's now called CDS1 (Cooperative
Data Sharing, Level 1), and there is info at
http://www.nas.nasa.gov/NAS/Tools/Projects/CDS
(Please don't be thrown by the cutesy graphics.)
CDS1 contains a direct superset of the functionality you describe in
the NOAA system, but is much more general. Briefly, each process has
a "comm heap", out of which regions are allocated (just like
MPI_RMA_MALLOC and your MAKE_SHARED_BUFFER), and a set of "comm cells",
which are used for actually passing these regions between processes.
The comm heap is logically local to each process, the comm cells are
logically global. Each comm cell is capable of holding many regions (in
reality, pointers to regions) in a queue structure. Communication is
performed by putting a region from the comm heap into a comm cell, and
allowing another process to retrieve it from there. There are
actually 5 basic comm cell operations:
write: erase contents of comm cell, replace with new region
enq: add new region to end of regions already in cell
read: get a copy of first region from comm cell
deq: remove first region from comm cell
zap: erase contents of comm cell
(In CDS1, message passing primitives "send" and "recv" are built on
top of these as well, effectively using the comm cells as tags.)
In CDS1, multiple processes on the same processor automatically share
read access to a *single* region. This is enabled by implementing
what might be called "virtual shared regions". A process
is obligated to call a special CDS1 routine before modifying a region
(in most cases), and CDS1 will make a copy of the region if it is
currently being read by another process. (This is, in fact, the only
case in which CDS1 makes physical copies on the same processor.) In
other words, passing a region (i.e. a pointer) through a cell to another
process (or the same process) on the same processor does not create a
physical copy of the region -- a reference count is incremented internally
and they just both point to the same region -- until one of the processes
wants to write to the region.
CDS1 also has lots of other stuff -- dynamic process creation, handlers,
data translation -- but it also doesn't have lots of stuff that MPI
does have -- e.g. communicators and most collective operations. That is
being built on top of CDS1 (as was originally envisioned) as "CDS2".
This keeps CDS1 small and simple -- about 29 routines total.
(CDS1 also doesn't have much use for blocking or synchronous
communication, so those are omitted.)
Now, having described this, I'm not sure how it would fit into MPI. I
would like vendors to support this functionality, but it works in CDS1
partially because it was designed in from the ground floor. I get the
sense that adding this functionality to MPI might really mean adding
most all of CDS1 to MPI, and MPI is already pretty big. To fit into
MPI's style, it will at least need some added typing/automatic data
translation arguments. (Translation is a separate step in CDS1.) CDS1
also may not meet some of the requirements that I've been hearing.
-Dave
-- =============================================================================== David C. DiNucci | MRJ, Inc., Rsrch Scntst |USMail: NASA Ames Rsrch Ctr dinucci@nas.nasa.gov| NAS (Num. Aerospace Sim.)| M/S T27A-2 (415)604-4430 | Parallel Tools Group | Moffett Field, CA 94035