> In the moment they all seem to be worse because they need
> more unnecessary locks, barriers or cache flushes or invalidates.
> I believe, that the user primarily wants
> *) Useful for applications
> *) Fast
> *) possible to implement on each platform
Here is an example that Eric Salo and I discussed a few days ago. (That
does not necessarily mean that Eric will agree with what I say here, only
that I stole some of his examples :-)
In a BARRIER-based model, like the main draft:
Process 1 Process 2 Process 3
read
BARRIER BARRIER BARRIER
PUT to 1 PUT to 1
compute compute
PUT to 1 PUT to 1
BARRIER BARRIER BARRIER
WINDOW_IN
read
BARRIER BARRIER BARRIER
PUT to 1 PUT to 1
compute compute
PUT to 1 PUT to 1
BARRIER BARRIER BARRIER
WINDOW_IN
etc.
Note that the first BARRIER is to ensure that nobody starts writing to
process 1 while it is still reading. What happens to process 2 and 3
at the beginning? They stop and wait for the barrier. When 1 is done
reading, and the barrier clears, the they do their thing, then they
need to sync again while 1 reads again.
Same example in the PUT/ACCEPT model.
read
ACCEPTC
PUTC to 1 PUTC to 1
compute compute
PUTC to 1 PUTC to 1
ACCEPTC ACCEPTC
read
ACCEPTC
PUTC to 1 PUTC to 1
compute compute
PUTC to 1 PUTC to 1
ACCEPTC ACCEPTC
Note that 2 and 3 immediately hit the PUTC. If the read on 1 has finished, and
1 is executing its ACCEPTC, then 2 and 3 can immediately move bits, as above.
But, unlike above, if 1 isn't done with the read yet, 2 and 3 can just store
away the PUTCs and go on to do the compute. When they get to the next PUTC,
they can check again whether 1 has reached its ACCEPTC, and the same goes
again-- if it has not finished, they store away the PUTC, otherwise they move
bits (along with those in the stored PUTC if necessary). Finally, 2 and 3 hit
the ACCEPTC, and now they *will* wait until 1 finishes the read and
executes the ACCEPTC.
So, yes, in this example, I claim that the PUT/ACCEPT model is "more efficient"
that the main draft.
Now a similar example with double buffering, to avoid some of the blocking in
the BARRIER model. Note that this is not identical in function, because the
elements in process 1 which are not addressed by PUTs do not hold their old
value between iterations.
In BARRIER model:
Process 1 Process 2 Process 3
read(buffA) PUT(1,buffC) PUT(1,buffD)
read(buffB) compute compute
PUT(1,buffC') PUT(1,buffD')
BARRIER BARRIER BARRIER
WINDOW_IN
read(buffC) PUT(1,buffA) PUT(1,buffB)
read(buffD) compute compute
PUT(1,buffA') PUT(1,buffB')
BARRIER BARRIER BARRIER
WINDOW_IN
Similar example with double buffering in PUT/ACCEPT model:
Process 1 Process 2 Process 3
IACCEPTC
read(buffA) PUTC(1,buffC) PUTC(1,buffD)
read(buffB) compute compute
WAIT PUTC(1,buffC') PUTC(1,buffD')
ACCEPTC ACCEPTC
IACCEPTC
read(buffC) PUTC(1,buffA) PUTC(1,buffB)
read(buffD) compute compute
WAIT PUTC(1,buffA') PUTC(1,buffB')
ACCEPTC ACCEPTC
The PUT/ACCEPT version is at least no less efficient than the BARRIER
model, and the approach makes sense -- i.e. the first ACCEPTC refers
to C and D for all processes, the second refers to A and B. In process 1,
you want the PUTCs to C and D to occur while you have other work to do (i.e.
to A and B), so you use IACCEPTC. Same in the next iteration.
In fact, I would even suggest that the programmer use different *tags* for
the ACCEPTC for A and B from those with C and D, to ensure that they were
kept straight.
As for point-to-point: There are similar differences, which are brought
about by the need to synchronize *before* the operation, as well as after,
to ensure that the operation does not take place *too soon*. For example,
while the chapter posts examples like this:
Process 1 Process 2
PUT(A in proc2)
wait for counter
WINDOW_IN
access A
It doesn't fully take into account that there is code before this in process
2 which may also be accessing A and doesn't want the PUT to succeed until
it is done, so there needs to be other synchronization. That synchronization
is not provided in the main proposal, so you've got to use messages or
BARRIERs or double buffering or something. It is integrated into the
PUT/ACCEPT proposal, since the PUT will not succeed until and unless the
target is executing an ACCEPT (or IACCEPT/WAIT).
Process 1 Process 2
access A
PUT(A in proc2)
ACCEPT(1)
access A
-Dave
===============================================================================
David C. DiNucci | MRJ, Inc., Rsrch Scntst |USMail: NASA Ames Rsrch Ctr
dinucci@nas.nasa.gov| NAS (Num. Aerospace Sim.)| M/S T27A-2
(415)604-4430 | Parallel Tools Group | Moffett Field, CA 94035