I read this as saying that the invisible headers are sufficient to allow
MPI_RMA_INIT to determine what type of segment has been passed to it. However,
MPI_RMA_INIT may have been passed a pointer to an area not allocated by
MPI_RMA_MALLOC, in which case it will not have the invisible header. Thus,
this is not a robust solution to the problem of determining the memory type
(page 4 line 38).
The seperation of MPI_RMA_MALLOC into two parts, MALLOC followed by INIT
complicates the implementation, and requires MPI to maintain an internal list
of the segments (for FREE and INIT). In contrast, a single MPI_RMA_MALLOC,
which both allocates the memory, and performs the INIT function can cache the
memory address in the created communicator. The allocated memory can easily be
free'd when the communicator is destroyed. This also eliminates the need for
MPI_RMA_FREE.
Page 4 line 12
We need to make it clear that the displacement is scaled by the target
displacement unit. All of the processes may have opened windows, each with its
own displacement unit. The displacement for a get or put to a target window
must be scaled by the displacement specified by the target.
Page 6 line 9
The words "interfere in any way" seem to all encompasing. Some (in the
real-time world) might read this to mean that gets and puts use a different
hardware mechanism to ensure that they don't interfere with the performance of
the other MPI operations.
Page 6 line 29
Actually these datatypes are portable because they contain no explicit
displacements. All displacements are derived by the MPI implementation, which
can thus use its knowledge of the remote machine architecture to derive the
correct displacements. In a NOW implementation, only the type signature needs
to be transmitted with the data, not the type map.
Page 18 line 9
Why do we wish to discuss implementations which support atomic access to
larger elements? If one is writing a portable program (the intent of MPI), one
must take the worst case assumption, which is that the hardware only supports
byte atomicity. We shouldn't encourage the user (or vendors) to use/support
anything more.
Page 18 line 18
I beleive we are intentionally disallowing an implementation which uses only
callbacks to implement the RMA agent. We need some more definitive examples to
hammer this home, e.g.:
MPI_Put ()
MPI_Fence ()
must not deadlock irrespective of the actions of the target process, in
particular, the target process need not make any MPI calls.
Process 1 Process 2
-------------------------------
a = 0;
MPI_Barrier(); MPI_Barrier();
MPI_Put (); while (a == 0); /* put a non zero value to a */
Assuming "a" is a byte within the target window, the above is legal, and
should not deadlock. No MPI calls are made on the receiver from which the RMA
agent can be called, therefore the agent must execute from an interrupt, or
signal, or in a seperate processor.
We also need to address resource limitations. Similar to the standard mode
send, MPI_Put calls should be automatically throttled to ensure that resources
are not exhausted. Thus, while puts may run ahead for a while, they must
eventually block and synchronize with the remote agent. Non-blocking puts may
fail due to resource limits.
We may wish to make a statement about the (lack) of fairness for put calls.
Page 22 line 24
The example should be MPI_Type_recv, not MPI_Recv
Page 24
This example assumes that "comm" refers to a window communicator, for which
the window is presumably each processes entire local address space. Presumably
in such a case the displacement unit would be one. This should be made
clearer.
To ease communication of pointers, I propose the following extension. This
allows local pointers into a window to be automatically converted into
displacements into the window and vica-versa.
Define a new MPI datatype: MPI_POINTER. The type matching rules for
MPI_POINTER are:
1) MPI_POINTER matches MPI_INT
2) MPI_POINTER matches MPI_POINTER only if the origin and target are the same
process
3) On the target process: void* matches MPI_POINTER
4) If the origin and target are the same process, the void* matches
MPI_POINTER
on the origin process
The semantics of MPI_POINTER when used for get are as follows:
1) A void* is read from the target buffer
2) The base address of the window is subtracted from 1) above
3) 2) above is divided by the window displacement
4) The displacement from 3) above is sent to the origin process, and received
as an int.
The pointer must point to a location within the local window.
The semanitics for put are the reverse of the above.
MPI_POINTER can also be used in a send/recv operation with a window
communicator. The semantics are the same as above.
We also define a new MPI constant "MPI_NULL_PTR" which is an integer
displacement corresponding to a NULL pointer. Clearly this cannot be zero,
since a zero displacement into a window is legitamate. It would probably have
to be something like -1.
An example which allows remote access to local structures is then:
/* Definition of a public node in the memory of the target process */
typedef struct {
void *next;
double content;
} PNODE;
MPI_Datatype PNODE_TYPE;
blens = {1,1};
types = {MPI_POINTER, MPI_DOUBLE};
MPI_Type_contiguous_struct (2, blens, types, &PNODE_TYPE);
MPI_Type_commit (&PNODE_TYPE);
/* Equivelent node after is has been read into origin memory */
typedef struct {
int next;
double content;
} RNODE;
MPI_Datatype RNODE_TYPE;
blens = {1,1};
types = {MPI_INT, MPI_DOUBLE};
MPI_Type_contiguous_struct (2, blens, types, &RNODE_TYPE);
MPI_Type_commit (&RNODE_TYPE);
void insert (double context, void *head, MPI_COMM comm);
{
PNODE *pnode;
int rank;
MPI_Aint displ;
pnode = (PNODE*)malloc(sizeof(PNODE));
pnode->content = content;
MPI_Comm_rank (comm, &rank);
MPI_Address (head, &displ);
/* The assumption here is that the comm window is the entire address space of
the local process, and the displacement unit is 1. Thus, the displacement
generated by MPI_Address is the correct displacement to use within the
window. */
MPI_Rmw ((void **)&pnode, &(pnode->next), 1, MPI_POINTER, rank, displ,
MPI_SWAP, 0, comm);
return ();
/* Local node for list on origin process */
typedef lnode struct {
double content;
struct lnode *next;
} LNODE;
void remote2local_list_copy (int rank, MPI_Comm comm, LNODE **head);
{
RNODE rnode;
LNODE *tail;
int rhead,
/* Assume displacement 0 contains the head of the list */
MPI_Get (&rhead, 1, MPI_INT, rank, 0, 1, MPI_POINTER, comm);
if (rhead == MPI_NULL_PTR) {
*head = NULL;
return ();
}
else {
*head = (LNODE*)malloc(sizeof(LNODE));
MPI_Get (&rnode, 1, RNODE_TYPE, rank, rhead, 1, PNODE_TYPE, comm);
/* If we overlay the local pointer and remote displacement field, we
could probably avoid copying content. Instead we could overwrite
the remote displacement after we are finished with it. */
(*head)->content = rnode.content;
tail = *head;
while (rnode->next != MPI_NULL_PTR) {
tail->next = (LNODE*)malloc(sizeof(LNODE));
tail = tail->next;
MPI_Get (&rnode, 1, RNODE_TYPE, rank, rnode->next, 1, PNODE_TYPE, comm);
tail->context = rnode.content;
}
tail->next = NULL;
}
return ();
}
The problem with 64 bit to 32 bit addresses is somewhat avoided since the
target processor is (conceptually) responsible for converting a pointer into a
displacement. Of course this displacement may be greater than 32 bits, but
this problem exists already since window displacements use ints. In essence,
our semantics for windows already limits 32 bit processors so that they can
only access windows less than 2^31 bits in length. If a 64 bit processor
wishes to open its entire address space to 32 bit remote process there is a
problem. Changing window sizes and displacements to MPI_Aint doesn't solve the
problem, since an Aint is driven by the size of a local pointer, and not the
largest pointer in the system.
Lloyd Lewins
Hughes Aircraft Co.,
llewins@msmail4.hac.com