RE: we should wait for 1sided implementations

Shane Hebert (shane@ERC.MsState.Edu)
Wed, 22 May 1996 10:39:30 -0500

But let's hear from the implementors! Who is willing to commit to a prototype
in the next 6 months? Anyone?
==============================

I have been thinking about how to implement this on Windows NT.
I cannot commit to prototype it but I believe I can see how it can
be done. I agree that it would be an expensive procedure requiring
a collective operation. On a machine using only a shared memory
device, it is not too bad but on a machine using both shared memory
and TCP, it gets trickier.

I am sure there are others out there who have thought as much as
I have on what is required for your implementation but here are my
ideas for the shared memory version and the problem (with a solution)
with TCP.

[stream-of-consciousness alert:]

Shared memory for Windows NT:
Basically, when the RMA_Malloc is called, map a region of shared
memory and pass that OS_handle around to all the other processes.
Each process gets this message and maps that region based on that
handle. Now, the time consuming part is when a put or a get to that
shared memory is called, the OS_handle has to be looked up (possibly
the time consuming part) and associated with the memory address then
the put/get can proceed simply as a memcpy(). Perhaps an MPID_SMHANDLE
can be defined and the result of the RMA_Malloc returns one of those
to the calling process and all communications then proceed through that
data structure (should speed it up some since no lookup) but this definitely calls for
RMA_MALLOC to be a collective operation that returns an
MPID_SMHANDLE to each calling process. One process actually does the
allocation and the rest simply map it, of course. From then on, it is
easy to read and write to that section of memory (even in a mutually
exclusive way if you include a mutex in the MPID_SMHANDLE).
Having RMA_Malloc be a collective operation is expensive of course.

Using a collective RMA_Malloc that every process must participate in
would simplify the TCP case greatly. Without it, there must be a
way to generate unique identifiers (numbers, strings, whatever) across
the platforms. Actually thinking about it now, I think the collective type
of RMA_Malloc is the *only* way to do these operations. The
MPID_SMHANDLE can also contain the information about on which
machine the RMA_memory actually exists. For machines
without shared memory, a new packet type should be defined that
contains the location and length to read/write the data. If the data must be broken
across multiple packets, each packet should contain at least the base address
and an offset to read/write from/to. The packet receiving code must become
smarter of course to perform these put/get functions behind the scenes.

It may not be pretty, easy, or fast but it can be done. On machines that
actually have some sort of shared memory architecture, it can be fairly
easy and have some speed. On machines without shared memory,
it becomes slower. There was talk of having an "agent" or something
to handle these put/get requests on machines without shared memory.
This would be a prime target for (close your eyes Eric and Raja :-)
threads.

Sorry if this disjoint and confusing, I'm in a hurry and didn't even proofread
beyond the spell-checker... (besides, I did give a warning...)

Shane