> There's absolutely no reason why the hardware should not be capable of
> handling DMA's to arbitrarily byte aligned buffers.
Right. In a perfect world this would not be an issue. On my machine it *is* an
issue, and we were asked to describe specific problems that our implementations
would have with the current interface. This is one.
> If MPI can encourage the design of such hardware so much the better
Right again, and once the more flexible hardware becomes commonly available we
can standardize it! I say again: we can always relax any restrictions at a
later time if we choose to do so.
> I thought we voted out the fully general handlers last meeting other
> than in environments with full thread support, so it seems hard to
> rely on them.
Touche. This is a very annoying point. But at the very least, it would seem to
add support to Bill's original suggestion that we defer final voting, since
there are obviously still some issues that are not yet fully understood.
> The same argument could have been applied to the non-contiguous data
> types, however as we expected the implementations have been able to
> exploit the additional information that they provide and achieve much
> higher performance than had they been omitted.
Some examples/numbers would be very interesting, my impression was that most
implementations did a poor/nonexistent job of optimizing the non-contiguous
> 1) the restrictions are hard for users to comply with. (How can I
> specify the alignment of a variable in standard Fortran ?)
Users of the Cray SHMEM library have to deal with this today. Having used that
library extensively myself, it's not a Big Deal. (This ties into my other
argument, that MPI_RMA_MALLOC should be mandatory, which will require some sort
of FORTRAN pointers anyway.)
> 2) many of the extended features only cost if you use them, so
> it is indeed more work for you as an implementer, but it isn't a
> performance issue for your users. (You are entirely free to publish
> guidelines explaining how to achieve the fast path in your
> implementation, though, of course, you may not want to do this since
> by implication it also points out the slow path...)
It's a *lot* more work. I think Greg really hit the nail on the head here; a
lot of MPI implementations are just now starting to come into their own. It
took us well over a year to finally get a decent MPI-1 implementation in place,
and that's just the base library. Now we've got this *tremendous* amount of new
functionality in MPI-2 that we're about to approve and it makes me very
I have no objection at all to pointing out the slow paths in my library because
that also highlights the fast paths. On the other hand, if we can build in a
few very basic performance guidelines into the standard itself, we'll be
educating users while at the same time making life a lot easier for the
> 3) remote store access isn't there only for performance, it's also
> there because it's a useful programming model which is
> fundamentally different from message passing in its semantics.
> It isn't in general trivial to change a remote store access code
> into a message passing code.
I agree, but I don't quite see your point. I'm certainly not arguing that
remote store access isn't useful. I'm quite a fan of it, actually, which is why
I want to see a good standard.
But performance is implicitly promised in any get/put model. This is certainly
true on the T3D, where lots of applications are first written using PVM and
then ported to SHMEM. And guess what, a big part of what makes SHMEM so fast is
that it makes some very useful simplifying assumptions about data layout! All
I'm saying is that we should follow this very useful precedent.
-- Eric Salo Silicon Graphics Inc. "Do you know what the (415)933-2998 2011 N. Shoreline Blvd, 7L-802 last Xon said, just firstname.lastname@example.org Mountain View, CA 94043-1389 before he died?"