For MPI_Start, MPI_Start_all:
The following fields from the Request: request-type, tag, comm,
message length (bytes_as_contig), destination rank (either global or local)
For MPI_Wait*, MPI_Test*:
The following fields from the Request: request-type, comm
For MPI_Send, MPI_Isend, MPI_Ssend, MPI_Bsend, MPI_Recv:
Several functions also need an efficient way to translate local ranks to
global ranks. Note: MPI_Group_translate_ranks uses a linear search in MPICH,
and therefore doesn't meet the efficiency requirements!!
The context logged is the integer context generated in the usual "non-global
unique" way. It is disamiguated by the fact that global ranks are logged.
Currently, I "peek" into the MPICH opaque objects to implement my profiling.
This is obviously not portable to other implementations, but it meets my
While it might not possible to completely standardize the above functions
(some implementations may take exception to integer contexts for example). We
should at least be able to get those implementations which can support the
above features to provide a common interface.
Efficient request caching would meet most of the above needs. An optimized
version of MPI_Group_translate_ranks which treats MPI_Comm_world as a special
case would eliminate the need for a fast way to translate ranks. This leaves
context as the major sticking point. However, I remain concerned about
efficiency. Profiling must always trying to minimize the pertabation to the
timeline, and inefficiency cannot be tolerated.
Hughes Aircraft Co.