Re: Progress rule vs. MPI_DELIVER

Rolf Rabenseifner (Rabenseifner@RUS.Uni-Stuttgart.DE)
Fri, 19 Apr 1996 17:18:56 +0100 (DST)

Lloyd,

thank you for the detailed answer.

A short question and one idea:

1) Is there an logical error in your rules:

> --------------------- MPI Rules ----------------------
> An MPI program is erroneous, and the results in memory are undefined if either
> of the following rules are violated:
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ --- I think you mean:
one of the following cases occur:
>
> - Between two synchronization points of a process, this process reads a
> shared location, and this same location is updated by any other process at
> any time between these two points.
> - ...

2) An idea for MPI -- using the idea of synchronization points:

> where they synchronize. Two tasks are synchronized at the start and at the end
> of an MPI_BARRIER; at the end of a call to MPI_RMC_INIT, and at the start of a
> call to MPI_COMM_FREE (on a window communicator).

Origin node and target node are not symmetric because we think
about a cache only at the application on the target system.
PUT and GET issued by the origin node are accessing the
memory on the target node directly.

Is that right ore wrong? --- This is the crucial point!

If it is right then we should look for a chance to use
asymmetric synchronization points. We should look for a
chance to use normal messages in a well defined direction
to make the synchronization.
Then possibly we can use MPI_PUT-counters for one of the
synchronization points.
To achive a maximum of efficiency I think we must look for
methods without additional messages.

We allow the following accesses to one location:

Case A)

origin node(s) target node

"write write-back-cache to memory"
/--- send sync-"message"
receive sync-"message" <--/

several MPI_GET local loads
from several nodes

send sync-"message" ---\
\--> receive sync-"message"
"nothing to do with caches"

local loads and stores

Case B)

origin node target node

"write write-back-cache to memory"
/--- send sync-"message"
receive sync-"message" <--/

one MPI_PUT local access not permitted

send sync-"message" ---\
\--> receive sync-"message"
"flushing write-through-cache"

local loads and stores

In both cases between the "synchronization points"
several accesses with the same type (PUT or GET) are allowed
for different locations inside the same MPI RMA window.
("the same type" restricts the partial cache line problem
to the boundaries of the window.)

Lloyd, is there really any case where we need a BARRIER?

If the idea with the "asymmetric synchronization points"
is without an error (besides the problem with partially
cache lines at window boundaries) then the following
scheme can be used in FE-simulation programs.
This scheme is without any message. Synchronization is
done only with MPI_PUT counters.

The memory buffers are a_B1, a_C1, b_A1, b_C1, c_A1, c_C1
and a_B2, a_C2, b_A2, b_C2, c_A2, c_C2.

a_B1 is on node a, written by node b and the first of two buffers
that are used alternately.

node A node B node C

READY(a_B1) READY(b_A1) READY(c_A1)
READY(a_C1) READY(b_C1) READY(c_C1)
while(...) while(...) while(...)
READY(a_B2) READY(b_A2) READY(c_A2)
READY(a_C2) READY(b_C2) READY(c_C2)
PUT(node b, -->b_A1) PUT(node a, -->a_B1) PUT(node a, -->a_C1)
PUT(node c, -->c_A1) PUT(node c, -->c_B1) PUT(node b, -->b_C1)
WAIT-COUNTER(b&c) WAIT-COUNTER(a&c) WAIT-COUNTER(a&b)
DELIVER(a_B1) DELIVER(b_A1) DELIVER(c_A1)
DELIVER(a_C1) DELIVER(b_C1) DELIVER(c_B1)
read locally a_B1 read locally b_A1 read locally c_A1
read locally a_C1 read locally b_C1 read locally c_B1
READY(a_B1) READY(b_A1) READY(c_A1)
READY(a_C1) READY(b_C1) READY(c_C1)
PUT(node b, -->b_A2) PUT(node a, -->a_B2) PUT(node a, -->a_C2)
PUT(node c, -->c_A2) PUT(node c, -->c_B2) PUT(node b, -->b_C2)
WAIT-COUNTER(b&c) WAIT-COUNTER(a&c) WAIT-COUNTER(a&b)
DELIVER(a_B2) DELIVER(b_A2) DELIVER(c_A2)
DELIVER(a_C2) DELIVER(b_C2) DELIVER(c_B2)
read locally a_B2 read locally b_A2 read locally c_A2
read locally a_C2 read locally b_C2 read locally c_B2
endwhile endwhile endwhile

with READY(...) == "write write-back-cache to memory"
and DELIVER(...) == "flushing write-through-cache"

The buffers e.g. a_B1 and a_C1 can be located in the same
MPI RMA window.

Rolf


Rolf Rabenseifner (Computer Center )
Rechenzentrum Universitaet Stuttgart (University of Stuttgart)
Allmandring 30 Phone: ++49 711 6855530
D-70550 Stuttgart 80 FAX: ++49 711 6787626
Germany rabenseifner@rus.uni-stuttgart.de