I have been doing this for months. Here are two, off the top of my head, which
everyone has already heard. I've got lots more if you want 'em...
1) We currently have no way to directly access arbitrary address ranges between
different process running on the same host. What we *can* do is dynamically
allocate shared buffers, similar to what you get with SysV shared segments.
Yeah, I know that the T3D and Puma systems can do more than that - so now let's
talk about the other 99% of the UNIX world.
Adding an optional MPI_RMA_MALLOC functions solves nothing. Because it is still
*possible* to use static get/put windows, we must support them, which means
that we still have to implement those stupid agents anyway for functionality
that is highly dubious.
2) It is possible today for a process on one Power Challenge to directly store
a 64-bit value (or many million of them) into the address space of a process
running on a second Power Challenge if the two machines are connected by HIPPI.
No agents, no locks, no interrupts; the bits just go where they belong. But we
can't do anything smaller than 64 bits and the values must be 64-bit aligned.
Once again, the generality of the current proposal prevents us from having a
simple implementation. Since it is possible for a user to PUT a single byte, we
must once again implement an agent to handle the obscure cases.
This problem exists within individual machines as well. For example, the MIPS
instruction set allows for atomic updates of 32-bit or 64-bit values without
requiring locks. For values smaller than that, we're screwed.
> In any case, the situation is not as bleak as you make it. An implementation
> of put/get that is quite close to the current proposal is available on the
> as an extension of the IBM MPI library.
Is the performance superior to that of send/recv? Let's assume that it isn't.
What, then, is the advantage of having it? We have already established that
equivalent functionality will be available thru the remote handler interface,
so the only possible reason for having a dedicated put/get interface at all
must be performance. Otherwise, what's the point? Are we devoting all of this
combined effort just to produce a chapter of syntactic sugar?
> (the latest version, which hopefully resolves some leftover problems for
> machines that do not have byte stores will be sent out this morning).
The only solutions that I see suggested involve locks and/or agents. Yes, this
will provide correctness. But I am not claiming that correctness is impossible,
I am only saying that the performance costs of guaranteeing it are very high.
> I should point out that I was rebuffed by the forum at large in some attempts
> to reduce the functionality of the design, in order to allow for more
> efficient implementations, and attempts to move toward more shared memory
Yes, this has happened to many of us. Exactly my point. As I said, *for
whatever reason*, this chapter has been extremely difficult to change. I'm not
assigning blame, merely stating an observation about the chapter as a whole.
> The time has come for each vendor to closely scrutinize the design and argue,
> specifically, about problems he/she would face on his/her platform.
Done. Next, please!
-- Eric Salo Silicon Graphics Inc. "Do you know what the (415)933-2998 2011 N. Shoreline Blvd, 7L-802 last Xon said, just email@example.com Mountain View, CA 94043-1389 before he died?"