> The issue of MALLOC vs INIT I believe is clearly understood, and
> has been debated at length... ...I would expect SGI
> Fortran to support declarations that allow to allocate static variables in
> shared memory, and thus, be able to take advantage of the current design.
The issue has definitely been debated at length but I very much question
whether it is clearly understood by everyone. The above is a perfect example.
Simply putting static variables in shared memory does *not* provide the same
programming model as gets/puts. Any process which does a put into a static
shared variable will change that variable for all of the other processes
running on that host, not just for the target process. So for static variables
we are once again back to requiring some sort of slow agent to move the bits.
> For obvious reasons you don't like the final design and I like it: what is
> stupid for one machine is intelligent for another.
Yes and no. There are practical considerations which determine how to implement
things on different machines, but the flexibility of the current proposal is
getting in the way of machines which would otherwise be capable of doing
better. We are forcing all implementations to carry a highly non-trivial amount
of extra baggage in order to support a generality that does not reflect common
practice. And this is completely counter to the "high performance" selling
point which is a large part of what makes MPI such an attractive alternative
to, say, PVM.
> On your machine it forces you to have an implementation effort that
> you would like to avoid, and gives more rope that you would like for your
> users to hang themselves. But if users voluntarily restrict themselves to the
> programming style you want to force upon them, then no performance is lost.
> The alternative, on my machine, would restrict useability without adding any
> performance advantage.
The reason that I would like to allow implementations to require aligned
buffers is for host-to-host performance, which should certainly apply to the SP
as well. Consider the following common setup: You've got a bunch of machines
connected with some type of cable that plugs into a controller on each machine.
Each controller has a DMA engine for moving bits between the host and the
network.
Question: In the above configuration, what is the fastest way to move data from
one host to another?
Answer: Have the DMA engine on the sending host read the data directly from the
user's send buffer, and have the DMA engine on the receiving host write the
data directly into the user's receive buffer. So how can we make this happen?
Well, unless your DMA engine is only 8 bits wide, you are going to have to
assume some sort of data alignment. If you have 32-bit wide DMA, then you'll
need 32-bit alignment, and so on. The specific requirements will vary from
system to system, but fundamentally the problem of alignment is always gonna be
there in these high-performance configurations. So it ain't just my machine
that cares about this, it's potentially a whole bunch of 'em.
> I believe that no agents are needed, and that the performance
> of correctly aligned transfers is not affected, as they **do not** require
> locking or agents. But please check carefully the text. If I am wrong, then
> we should go back to a design where we allow access alignment constraints,
> globally, or per window.
Now we move from technical to religious, I suppose. We already have a fully
general send/recv interface and are working on an equally general handler
interface. So the functionality is already there for users who want it. If we
make the get/put interface equally general, then it will greatly complicate
implementations which could otherwise be quite simple. All that we're doing now
is making it easy for users to get poor performance.
> we do expect that put/get will be supported more efficiently over time than
it
> could be using a general handler interface. Other vendors or implementors
> seem to share this expectation.
The issue is not whether a native put/get can be made faster than a generic
handler implementation, it's whether one can be made faster than *send/recv*.
This can be done on many systems if we make some simplifying assumptions, which
I think is a very worthwhile tradeoff.
Final thought: If we err on the side of too many restrictions, we can always
relax them later. If we err on the side of too much flexibility, we are hosed.
-- Eric Salo Silicon Graphics Inc. "Do you know what the (415)933-2998 2011 N. Shoreline Blvd, 7L-802 last Xon said, just salo@sgi.com Mountain View, CA 94043-1389 before he died?"