(no subject)
Marc Snir (SNIR@watson.ibm.com)
Wed, 10 Jul 1996 11:15:30 -0500
----------Forwarding Original Note --------
To: mpi-1sided @ mcs.anl.gov @ GW2
cc:
From: salo @ mrjones.engr.sgi.com ("Eric Salo") @ GW2
Date: 07-09-96 12:12:09 PM
Subject: Re: a (radical ?) alternative to current chapter 4
Security:
> ********
> If the concurrent PUTs do not update the same location in the window
then
one
> can use a shared lock for the puts, rather than an exclusive lock. So,
all
> puts can go one concurrently.
> ********
This is a bit confusing. The current text of the alternate proposal uses
"read"
and "write" to describe the two different types of lock, since this is
what
they will typically be used for. It may or may not be more correct/clear
to
label them as "shared" and "exclusive" locks instead, but that's just a
naming
issue.
************
The current text (distributed by Steve) labels the locks shared and
exclusive,
to avoid the confusion, and
states that it is not necessrary to protect a put with an exclusive lock.
**************
My understanding is that allowing multiple processes to PUT into the same
window at the same time might not be legal in the alternate proposal
because
that might mess up cache coherency on some systems. At least, it feels
fishy to
me because it does seem to violate the reason for needing the locks in
the
first place. Can someone (Marc? Raja?) come up with a compelling reason
why
this should or should not cause problems?
****************
There is not difference in implementation strategies between the original
proposal and the alternate proposal. The
difference is that, in the original proposal, the use calls window_in and
window_out, and in the
alternate proposal MPI makes these calls implictly when certain
sycnrhonization
occur.
Concurrent put are no porblems in the original proposal and are no
problem in
the alternate proposal, as
long as they do not acccess the same location. The "lock" is not
needed.
What is needed is a
hint to the system that now is good time to synchronize caches. Weak
coherence
memory systems typically
associate cache coherence actions with synchronization calls. The
rationale is
that in between a producer
writing something to memory and a consumer reading this something, there
must
be a synchronization operation that
involves the producer after its write and involves the producer before
its
read. Thus, rather than restoring coherence
after each memory access, it is sufficient to restore coherence when
these
synchronization operations occur. For such strategy to work (in MPI) on
needs
to make visible (to MPI) what are "synchronization points" in the
program, and
require the user to always synchronize conflicting accesses with one such
synchronization point. One could go to one extreme (suggested by Nucci)
and
say that only barriers are legitimate synchronization points. One could
go to
the other extreme and say that any MPI call (send, receive,...) is a
legitimate
synchronization point. In this later case, rules become much simpler,
but
performance goes down the drain. I suggest a middle way, allowing
barriers,
and locks, which fit a shared memory programming style. Each barrier call
and
each lock/unlock call is a synchronizing point. We could provide more
flexibility by allowing multiple locks per window, and saying that each
lock/unlock of any of the window locks is a synchronizing point for the
window. But, the choice of which calls can be used to synchronize
accesses on
a window, and therefore, which calls are associated with "window_in" and
"window_out" activities is orthogonal to the design of of the coherence
protocol on that window.
The only added complexity in our case, as compared to traditional weakly
coherent shared memory systems is that the load/store local accesses use
a
different mechanism for managing atomicity, tracking dirty cache lines,
etc,,
then the mechanism used for put/get. This forces us to say that any
load/store
must be separated from any put/get by a synchronization point, even if
they do
not access the same location. I.e., we cannot assume that we can do
coherence actions at a granularity smaller than a window.
*************************
On the other hand, it's not clear that the concept of windows is even
helpful
for the general coherency problem, because from a hardware perspective
one does
not typically perform coherence operations on distinct memory regions;
one
flushes/invalidates an entire cache, or perhaps an individual line. So we
might
be in trouble anyway, but again I don't have a concrete example (yet).
*************************
One needs a window because MPI does not know which locations were updated
by
local stores. Without windows, each synchronization (each "window_out")
will
require to flush the entire processor cache, which can be quite
expensive.
With windows, we can limit the damage to one window. The PowerPC allows
one to
flush a region, rather than a cache line or the entire cache. I dont
know what
the story is with MIPS, but I would be surprised to hear that only PPC
supports
efficiently selective flushes.
***********************************************************************
Eric Salo Silicon Graphics Inc. "Do you know what the
(415)933-2998 2011 N. Shoreline Blvd, 8U-808 last Xon said, just
salo@sgi.com Mountain View, CA 94043-1389 before he died?"
---End of forwarded mail from Marc Snir/Watson/IBM Research
<snir@watson.ibm.com>
--
Eric Salo Silicon Graphics Inc. "Do you know what the
(415)933-2998 2011 N. Shoreline Blvd, 8U-808 last Xon said, just
salo@sgi.com Mountain View, CA 94043-1389 before he died?"