[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
MPI Process Topologies - discussion?
Dear Group,
We at NEC have recently worked on the topology functionality for the
MPI/SX implementation. This chapter of the standard leaves open
some points for clarification/discussion:
Best regards
Jesper & Hubert & Falk
ps: included are comments from Rolf Hempel, that he asked me to forward
to the group with this mail
=== specific points (page numbers refer to the printed MPI book, vol 1) ===
MPI_Graph_create:
* Is an empty graph allowed? If not, why not? (it seems correct in this
case to return MPI_COMM_NULL as the new communicator)
* It is not explicitly said whether self loops (ie. edge from node u to
node u) are allowed in the graph? They should be. Example 6.4, p. 335,
does not have self loops, but example 6.5, p. 338, has.
* It is not explicitly said whether multi-edges (several edges from u
to v) are allowed. Probably they are not (but could be, and could be
used to model particularly intensive communication between certain
nodes)
* It is not said whether all processes must give the same graph in the
call to MPI_Graph_create/MPI_Graph_map. Assuming that it is so (!!),
this should be mentioned, and it should be made clear whether exactly
the same index/edges arrays must be provided, or if isomorphic
representations suffice.
* What is the use of the graph inquiry functions MPI_Graphdims_get and
MPI_Graph_get? These return nothing but the information given in the
call to MPI_Graph_create??
MPI_Cart_create:
* Is a 0-dimensional grid allowed? The standard says "Cartesian
structures of arbitrary dimension"; but the MPICH implementation
forbids 0-dimensional grids. It seems more correct to return
MPI_COMM_SELF to one of the calling processes.
MPI_Cart_coords:
p.327, line -7: "length of vector {\sf coord}..." should be
"length of vector {\sf coords}..."
=== general questions for discussion ===
Why was it decided that each process give the whole graph as input to
MPI_Graph_create/MPI_Graph_map? Wouldn't it have been more user-friendly,
and more "scalable" just to require the list of neighbors for each
process?
Are there other MPI implementations doing anything non-trivial with the
topology functionality? At some point HP had an implementation of at least
the graph functionality
Why is there no collective for permutation routing in the MPI standard?
This could be helpful for data redistribution when a new communicator has
been created, for instance by the topology creation functions?
=== Remarks from Rolf Hempel ===
> MPI_Graph_create:
>
> * Is an empty graph allowed? If not, why not? (it seems correct in this
> case to return MPI_COMM_NULL as the new communicator)
It is not forbidden in the standard, so it should be allowed. To return
MPI_COMM_NULL everywhere seems to me the obvious implementation.
> * It is not explicitly said whether self loops (ie. edge from node u to
> node u) are allowed in the graph? They should be. Example 6.4, p. 335,
> does not have self loops, but example 6.5, p. 338, has.
This topic was never discussed explicitly at the MPI forum, but my opinion is
that it should be allowed (in particular since, as you mentioned, an example
in the official document already does it).
> * It is not explicitly said whether multi-edges (several edges from u
> to v) are allowed. Probably they are not (but could be, and could be
> used to model particularly intensive communication between certain
> nodes)
For similar reasons as above, I would say that they are allowed. However, it
shold not be expected that the implementation gives that edge a greater
weight! The issue of edge weighting was discussed at length, and the result
was that it should not be included in the standard. (On the other hand, an
implementation that bases weighting on the edge count would would of course be
as valid as any other one.)
> * It is not said whether all processes must give the same graph in the
> call to MPI_Graph_create/MPI_Graph_map. Assuming that it is so (!!),
> this should be mentioned, and it should be made clear whether exactly
> the same index/edges arrays must be provided, or if isomorphic
> representations suffice.
This is a good point! Of course, it is assumed that every node provides the
same information. From a logical point of view, an isomorphic representation
would do, but it would perhaps confuse consistency checking routines in simple
implementations. Therefore, my advice would be to add a clarification saying
that every node must pass the same index/edges arrays.
> * What is the use of the graph inquiry functions MPI_Graphdims_get and
> MPI_Graph_get? These return nothing but the information given in the
> call to MPI_Graph_create??
True! The reason is just that the graph definition and inquiry might be done
in different parts of the program (and written by different people). The
inquiry function is just a convenience for the software developers (and easy
to implement :-)) ).
> MPI_Cart_create:
>
> * Is a 0-dimensional grid allowed? The standard says "Cartesian
> structures of arbitrary dimension"; but the MPICH implementation
> forbids 0-dimensional grids. It seems more correct to return
> MPI_COMM_SELF to one of the calling processes.
I agree, although at the MPI forum this question was never discussed. Your
implementation seems to me the only correct one. Perhaps a clarification would
help.
> MPI_Cart_coords:
>
> p.327, line -7: "length of vector {\sf coord}..." should be
> "length of vector {\sf coords}..."
Right! Just a typo.
> === general questions for discussion ===
>
> Why was it decided that each process give the whole graph as input to
> MPI_Graph_create/MPI_graph_map? Wouldn't it have been more user-friendly,
> and more "scalable" just to require the list of neighbors for each
> process?
As far as I remember, scalability to really large process numbers was not
regarded as a real issue by the HW vendors in the MPI forum. At that time,
machines with more than a few hundred processors seemed not an issue any more.
Remember that ASCI was not yet on the horizon, and even the Connection
Machines were moving towards more powerful single processors. Therefore,
providing the whole map everywhere did not seem to be a burdon but a
convenience to the implementer (the whole implementation could be done by
replication, without any communication.)
> Are there other MPI implementations doing anything non-trivial with the
> topology functionality? At some point HP had an implementation of at least
> the graph functionality
This is also my information status. I don't know of any other non-trivial
implementation. So much more I'm pleased to see that NEC finally did it!
> Why is there no collective for permutation routing in the MPI standard?
> This could be helpful for data redistribution when a new communicator has
> been created, for instance by the topology creation functions?
Yes. The reason is simply that nobody asked for it during the standardization
process and gathered enough support for it among the forum members. I'm afraid
that now it's too late.
Best wishes,
Rolf
P.s.: Feel free to include my comments in your email to the mail reflector.