This should be the same as the previous PUT/ACCEPT proposal I posted, with the
following changes which I have mentioned in subsequent mail:
(1) RMA_PROBE and RMA_IPROBE have been deleted, and replaced by a suggestion
that the STATUS argument returned by PROBE and IPROBE be extended to
include an extra field, MPI_OP, which specifies whether the probed
object is a message, a PUT, or a GET.
(2) PUTHOLD and GETHOLD have been added. These are identical to PUT and
GET except that they will not help to satisfy the count on an ACCEPT
or OFFER or AVAIL operation. These are added to allow processes
flexibility in the number of PUTHOLD or GETHOLD operations they perform,
only informing their target (with a PUT or GET operation) when they are,
in some sense "done".
(3) AVAIL and IAVAIL have been added. AVAIL acts like a combination of
OFFER and ACCEPT with a count of 1 -- i.e. it can satisfy exactly one
GET or PUT, optionally preceded by GETHOLDs and PUTHOLDs respectively.
This is added to simplify the construction of customized agents.
(4) Some explicit mention of functionality that is not provided by this
proposal, at the beginning of the discussion section. This unsupported
functionality includes 3rd-party communication and any guarantees that
PUT and/or GET will be satisfied before the operation on the target tries
to complete the matching operation.
(5) The axiomatic semantics have been dropped until they can be added
in their complete and correct form.
Syntax (i.e. Fortran bindings)
==============================
MPI_RMA_MALLOC(base, size)
OUT base Address of allocated buffer
IN size Size of buffer to allocate, in bytes
MPI_PUT(origin_addr, origin_count, origin_datatype, target_rank,
target_disp, target_count, target_datatype, tag, comm)
MPI_PUTHOLD(origin_addr, origin_count, origin_datatype, target_rank,
target_disp, target_count, target_datatype, tag, comm)
MPI_PUTC(origin_addr, origin_count, origin_datatype, target_rank,
target_disp, target_count, target_datatype, tag, comm)
IN origin_addr Data to be put
IN origin_count Number of data elements at origin_addr
IN origin_datatype Datatype of each element at origin_addr
IN target_rank Rank of target
IN target_disp Relative displacement in target
IN target_count Number of elements in target
IN target_datatype Datatype of elements in target
IN tag Tag
IN comm Communicator
MPI_GET(origin_addr, origin_count, origin_datatype, target_rank,
target_disp, target_count, target_datatype, tag, comm)
MPI_GETHOLD(origin_addr, origin_count, origin_datatype, target_rank,
target_disp, target_count, target_datatype, tag, comm)
MPI_GETC(origin_addr, origin_count, origin_datatype, target_rank,
target_disp, target_count, target_datatype, tag, comm)
Same arguments as MPI_PUT, except
OUT origin_addr Data to be retrieved
MPI_ACCEPT(base, size, disp_unit, tag, comm, count)
MPI_IACCEPT(base, size, disp_unit, tag, comm, count)
MPI_ACCEPTC(base, size, disp_unit, tag, comm)
MPI_IACCEPTC(base, size, disp_unit, tag, comm)
INOUT base Buffer made available to PUT requests
IN size Size of base in bytes
IN disp_unit Scale factor for target_disp in requests
IN tag Tag
IN comm Communicator
IN count Number of requests to service
MPI_OFFER(base, size, disp_unit, tag, comm, count)
MPI_IOFFER(base, size, disp_unit, tag, comm, count)
MPI_OFFERC(base, size, disp_unit, tag, comm)
MPI_IOFFERC(base, size, disp_unit, tag, comm)
Same arguments as MPI_ACCEPT, except
INOUT base Buffer made available to GET requests
MPI_AVAIL(base, size, disp_unit, tag, comm)
MPI_IAVAIL(base, size, disp_unit, tag, comm)
Same arguments as MPI_ACCEPT, except
INOUT base Buffer made available to PUT and GET requests
Operational Semantics
=====================
(This first paragraph is a general description of non-blocking operations in
MPI, and probably belongs -- or already exists -- in some other chapter.)
MPI Non-Blocking Rule: The operation Iop will always complete.
If a WAIT operation is performed on the request returned from an
Iop operation, or a TEST operation is performed on the request
and the TEST returns a "completed" status, then the combination
of the Iop and the WAIT or TEST will have identical semantics to
an op operation with the same arguments as the Iop operation,
except that other operations in the same thread may execute
after inception and before the completion of the combined
operation. The behavior of an Iop operation without a matching
WAIT, or a matching TEST which returns a "completed" status, is
undefined.
PUT, ACCEPT, PUTHOLD, and IACCEPT
Requesting: A PUT operation will always complete, and results in
the issuance of a PUT request. For brevity of description, the
arguments of the PUT operation issuing a PUT request will be
described as belonging to the request itself.
Matching: Each PUT request will be serviced at most once, and
only by an ACCEPT (or IACCEPT) or AVAIL (or IAVAIL) operation
with matching comm and tag arguments, and executing in the process
designated by the comm and target_rank arguments of the PUT request.
Multiple PUT requests issued from the same process (thread) and with
identical comm, tag, and target_rank arguments will be serviced
in the order in which they are issued.
Servicing: When a given PUT request is serviced by a given ACCEPT or
AVAIL operation, "origin_count" data items of datatype "origin_datatype",
starting at location "origin_addr" in the process on which the PUT
was issued, will be transferred to the process on which the ACCEPT
or AVAIL operation executes, in the location obtained by multiplying the
"target_disp" arg of the PUT request with the "disp_unit" arg of
the ACCEPT or AVAIL operation, and adding this to the "base" arg of the
ACCEPT or AVAIL operation, then interpreting this location to be the
beginning of "target_count" data items of datatype "target_datatype".
The servicing of a request is not necessarily an atomic action. A
PUT request will be called "fully serviced" if the entire data
transfer is complete. [Note: In the end, this should work the
same as in Marc's original proposal.]
Completion: An ACCEPT operation will complete if and only if all
of the PUT requests issued by the process executing the ACCEPT,
and having the same tag and comm, have been fully serviced and
(*) the ACCEPT has serviced exactly count PUT requests.
Non-conformance: Local references to addresses in the range
specified by the base and size arguments of an ACCEPT operation
are not permitted during the execution of that operation. Local
references to addresses in the range specified by the
origin_base, origin_datatype, and origin_count arguments of a
PUT operation are not permitted between that operation and the
following ACCEPT operation having the same tag and comm
arguments. [Is this too restrictive?]
Holding: A PUTHOLD operation is identical to a PUT operation
in every respect, except that the starred ("*") portion of the
"Completion" paragraph, above, does not apply -- i.e. the completion
of an ACCEPT operation is independent of the number of PUTHOLD
operations which it has serviced.
IACCEPT obeys the MPI Non-Blocking Rule.
PUTC, ACCEPTC, and IACCEPTC
The semantics for PUTC, ACCEPTC, and IACCEPTC are identical to
those for PUT, ACCEPT, and IACCEPT, after deleting the "Holding"
paragraph, replacing all PUT with PUTC, ACCEPT with ACCEPTC, and
IACCEPT with IACCEPTC, and replacing the last clause in the "Completion"
paragraph, marked with a (*), with the following:
if m-1 total ACCEPTCs and IACCEPTCs with this tag and
communicator have executed in this process prior to this one,
then m total ACCEPTCs and IACCEPTCs with the same tag and
communicator have completed (or will complete) in each of the
other processes belonging to the communicator.
GET, OFFER, GETHOLD, and IOFFER:
The semantics for GET, OFFER, GETHOLD, and IOFFER are identical to
those for PUT, ACCEPT, PUTHOLD and IACCEPT, after replacing all PUT
with GET, ACCEPT with OFFER, PUTHOLD with GETHOLD, and IACCEPT with
IOFFER, and replacing the phrase "transferred to the process" in the
"Servicing" paragraph with the phrase "transferred from the process".
GETC, OFFERC, and IOFFERC:
The semantics for GETC, OFFERC, and IOFFERC are identical to those
for PUTC, ACCEPTC, and IACCEPTC, after replacing all PUTC with
GETC, ACCEPTC with OFFERC, and IACCEPTC with IOFFERC, and
replacing the phrase "transferred to the process" in the
"Servicing" paragraph with the phrase "transferred from the
process".
AVAIL and IAVAIL:
If an AVAIL operation satisfies a PUTHOLD request, it will subsequently
only satisfy PUTHOLD requests and a PUT request.
If an AVAIL operation satisfies a GETHOLD request, it will subsequently
only satisfy GETHOLD requests and a GET request.
Completion: An AVAIL operation will complete if and only if all
of the PUT, GET, PUTHOLD, and GETHOLD requests issued by the process
executing the AVAIL, and having the same tag and comm, have been fully
serviced and the AVAIL has serviced exactly one PUT or GET request.
IAVAIL obeys the MPI Non-blocking rule.
Discussion
==========
These semantics allow a servicing operation -- i.e. an ACCEPT or OFFER
or IACCEPT or AVAIL or IAVAIL -- to delay the servicing of all PUT or
PUTHOLD or GET or GETHOLD requests until the servicing operation
completes or tries to complete. This is expected to be the case on
some architectures, specifically those in which the originator does not
have direct access to the target's memory (i.e. "window").
This proposal does not directly support 3rd-party communications. That
is, in order for two processes to communicate using the memory of a
third process, the third process must be involved, partially due to the
property mentioned in the previous paragraph. However, programs which
require the functionality of 3rd-party communications can be implemented
by having all parties perform two-party communications with a process
built by the user specifically to hold the data (sometimes known as a
custom agent or monitor).
Because of the odd completion condition for an ACCEPTC and OFFERC,
TEST is free to always return "notcompleted" for any request
returned by an IACCEPTC or IOFFERC. To do otherwise may require
TEST to engage in expensive communication.
(I)ACCEPT(C) and (I)OFFER(C) could be given origin_rank and status
arguments with the same purpose and general meaning as the same
arguments in MPI_RECV. However, since these operations can
service multiple requests, the status argument, at least, seems
less useful than for MPI_RECV.
It is tempting to believe that GETC and PUTC can be eliminated,
since GET and PUT have virtually identical syntax and semantics.
However, it is important from both a programming standpoint and an
implementation standpoint that the GET or PUT operations specify
whether they can be satisfied by a collective ACCEPTC or OFFERC
operation, or by a non-collective ACCEPT or OFFER operation. This
is justified below on two counts:
1. Ease of programming: If a few GETs or PUTs, intended for one
type of operation (i.e. collective or non-collective) are instead
matched by the other type of operation, the programmer will
perhaps never know -- all of the GET or PUT requests will be
serviced, though by an unintended servicer.
2. Ease of implementation: The collective operations operations
are implemented most efficiently by having each process count the
number of requests destined for each target, and then at
completion of the ACCEPTC or OFFERC, totalling these counts across
all processors (logn time) and broadcasting the resultant vector
to all processors (logn time), allowing each process to know how
many requests it needs to service before completing. If some of
the requests are serviced by non-collective operations, then
either each process will be required to keep a complete history of
the number of requests it has ever serviced for every
tag/communicator, or else a much less efficient method of handling
collective operations must be employed.
Similarly, the number of operations added to MPI could be
(artificially) decreased by removing ACCEPTC and OFFERC, and
stating that some special count field in ACCEPT or OFFER would
make these become collective operations. This would allow non-
collective operations to become collective through simple program
errors causing an incorrect calculation of the count, and would
sometimes make it more difficult to determine the intended
servicer of a GET or PUT operation syntactically. (It would not
affect ease of implementation in any way.)
It is also tempting to drop ACCEPT and OFFER, and adding a count
argument to AVAIL so that it will do their jobs. However, adding a count
to AVAIL would tempt users to believe that they could pass data through
a third party during a single AVAIL operation in the third party, and
in fact, they could do so on some architectures unless a significant
amount of memory and time overhead were added to preclude it. Labeling
such programs "non-conforming" would only partially solve the problem,
because to make such a program conform, the user would be required to
not only split the AVAIL operation into two separate AVAIL operations
(one for GETs and one for PUTs), he/she would be required to ensure that
the proper operations were serviced by the proper AVAIL, which would
probably require a separate tag for each. Thus, more work may be required
of the user if these operations are merged. (This may be a somewhat weak
argument.)
On non-cache-coherent architectures, where remote operations
access main memory but not cache, an OFFER or ACCEPT operation can
begin by performing a cache flush of the memory range (to ensure
that remote processes will see or update the freshest values).
Conforming programs will not bring any of these addresses into
cache by referencing them for the duration of the operation.
[What about a PUT or GET which target one's own process?]
However, the implementor may need to perform additional work in
the cases where false sharing exists -- i.e. where cache lines
may contain data from both inside and outside the public region.
===============================================================================
David C. DiNucci | MRJ, Inc., Rsrch Scntst |USMail: NASA Ames Rsrch Ctr
dinucci@nas.nasa.gov| NAS (Num. Aerospace Sim.)| M/S T27A-2
(415)604-4430 | Parallel Tools Group | Moffett Field, CA 94035