Re: we should wait for 1sided implementations

Marc Snir (snir@watson.ibm.com)
Wed, 22 May 1996 18:15:19 -0400

:-) :-) :-) *** (:- (:- (:-

> > Also, it will give users/vendors time to voice any
> > **specific** concern they have about implementation.
>
> I have been doing this for months. Here are two, off the top of my head, which
> everyone has already heard. I've got lots more if you want 'em...
>
> 1) We currently have no way to directly access arbitrary address ranges between
> different process running on the same host. What we *can* do is dynamically
> allocate shared buffers, similar to what you get with SysV shared segments.
> Yeah, I know that the T3D and Puma systems can do more than that - so now let's
> talk about the other 99% of the UNIX world.
>
> Adding an optional MPI_RMA_MALLOC functions solves nothing. Because it is still
> *possible* to use static get/put windows, we must support them, which means
> that we still have to implement those stupid agents anyway for functionality
> that is highly dubious.
>
My suggestion that implementers look carefully at the design was less about
issues that have been thouroughtly discussed and where tradeoffs are clearly
understood, and more toward possible gotcha's in the style of the second issue
you raise. The issue of MALLOC vs INIT I believe is clearly understood, and
has been debated at length. For obvious reasons you don't like the final
design and I like it: what is stupid for one machine is intelligent for
another. On your machine it forces you to have an implementation effort that
you would like to avoid, and gives more rope that you would like for your
users to hang themselves. But if users voluntarily restrict themselves to the
programming style you want to force upon them, then no performance is lost.
The alternative, on my machine, would restrict useability without adding any
performance advantage. I think it is a fair compromise. I would expect SGI
Fortran to support declarations that allow to allocate static variables in
shared memory, and thus, be able to take advantage of the current design.

> 2) It is possible today for a process on one Power Challenge to directly store
> a 64-bit value (or many million of them) into the address space of a process
> running on a second Power Challenge if the two machines are connected by HIPPI.
> No agents, no locks, no interrupts; the bits just go where they belong. But we
> can't do anything smaller than 64 bits and the values must be 64-bit aligned.
>
> Once again, the generality of the current proposal prevents us from having a
> simple implementation. Since it is possible for a user to PUT a single byte, we
> must once again implement an agent to handle the obscure cases.
>
> This problem exists within individual machines as well. For example, the MIPS
> instruction set allows for atomic updates of 32-bit or 64-bit values without
> requiring locks. For values smaller than that, we're screwed.
As mentioned in the note you reply to, I modified the proposal exactly to
handle this problem. The issue of accesses to variables smaller than
supported by remote stores is dealt at length in the current proposal sent
earlier today. I believe that no agents are needed, and that the performance
of corrently aligned transfers is not affected, as they **do not** require
locking or agents. But please check carefully the text. If I am wrong, then
we should go back to a design where we allow access alignment constraints,
globally, or per window. In general, the choice within MPI has been to give
the users arbitrary amounts of rope, provided that the prevalent case can be
optimized well. I thing this rule applies here.
>
> > In any case, the situation is not as bleak as you make it. An implementation
> > of put/get that is quite close to the current proposal is available on the
> SP,
> > as an extension of the IBM MPI library.
>
> Is the performance superior to that of send/recv? Let's assume that it isn't.
> What, then, is the advantage of having it? We have already established that
> equivalent functionality will be available thru the remote handler interface,
> so the only possible reason for having a dedicated put/get interface at all
> must be performance. Otherwise, what's the point? Are we devoting all of this
> combined effort just to produce a chapter of syntactic sugar?
>
Since the current put/get prototype uses a general remote handler interface,
it is, indeed, no more than a user library built on top of the more primitive
handler interface, at this point. This, by the way, is not new in MPI, where
much functionality is provided atop point to point message-passing. However,
we do expect that put/get will be supported more efficiently over time than it
could be using a general handler interface. Other vendors or implementors
seem to share this expectation.

Marc Snir
IBM T.J. Watson Research Center
P.O. Box 218, Yorktown Heights, NY 10598
email: snir@watson.ibm.com
phone: 914-945-3204
fax: 914-945-4425
URL: http://www.research.ibm.com/people/s/snir