I've been reading the RT-chapter again and again, and the subsection
about priorities is still confusing me : it seems like an unnecessary
complex model for a relatively simple feature. This is a suggestion for a
new model for priorities. Please send me your comments/improvments!
At the end, I suggest a few things that also can be modeled in a similar
fashion. Comments?
Best regards,
Bjarne Geir Herland
--- Paragon Systems Engineer \\ Parallel processing laboratory \\ ,-. ,- ,- ,- / / ,- |-. //Dept. of Informatics, N-5020 Bergen, Norway // |-' `-` | `-` / / `-` `-' \\ phone: +47 55 54 41 66 fax: +47 55 54 41 99\\ Bjarne.Herland@ii.uib.no // http://www.ii.uib.no/~bjarne/ //---
Priorities in MPI ===================
A priority in MPI is two integer-values attached to a communicator : 1) the communicator's maximum priority (MAXPRI), and 2) it's current active priority (CURPRI). Higher value means higher priority. By default, MPI_COMM_WORLD has MAXPRI set to the highest priority and CURPRI is undefined (the reason for undefined becomes clearer below).
When doing communication in a communicator, the operation gets priority equal to the communicators CURPRI, or MAXPRI if CURPRI is undefined.
When creating a new communicator, the values for new communicator(s) are set as follows : MAXPRI is set to the parent's CURPRI, and CURPRI is undefined. If two communicators with different CURPRI is merged, the lower value is used.
We provide a function MPI_COMMUNICATOR_PRI_SET() that allows the user to set a communicators CURPRI to a value <= MAXPRI. This value can only be set once! (I.e. the value can only be set if CURPRI is undefined.) By enforcing this, we give the programmer control over which parts of the program that can use high priorities : If the module/subroutine is trusted to handle this itself, the programmer just makes a copy of MPI_COMM_WORLD, avoids setting CURPRI, and passes the communicator to the module/subroutine, which then has the full range of priorities available. If the module/subroutine is considered to be low-priority, the programmer just creates a new communicator, sets the desired CURPRI, and passes the communicator to the module/subroutine.
In addition to this, we provide the following functions for housekeeping
MPI_COMMUNICATOR_NUM_PRI() [ number of priorities supported by the env.] MPI_COMMUNICATOR_PRI_GET() [ both CURPRI and MAXPRI ]
meaning that we only add three functions to MPI for handling priorities.
Summary : ===========
The suggested mechanisms allow the programmer to
a) use communicators to prioritize messages, and thereby avoid violating MPI's ordering-rule b) pass a communicator to a module/subroutine and completely control the priority of the messages this module/subroutine is allowed to use c) check that the environment supports multiple priorities (by using MPI_COMMUNICATOR_NUM_PRI) and choose a strategy from this d) if c) - find out how many priorities available (by calling MPI_COMMUNICATOR_PRI_GET with MPI_COMM_WORLD) and decide how to use them
For implementors, this means that
1) the three extra functions are easy to implement if priorities are not supported in the implementation 2) in order to enable priorities, only small changes needs to be done to the communicators and routines for manipulating them 3) of course - to actually implement the priorities, more work must be done, but a simple approach *could* be to service communicators in prioritized order [ this needs more thinking. any comments? can we implement overtaking messages in a simple way? ]
Questions : ============
Is this what we want/need for priorities in RT-MPI? Is it sufficient? Is it *useful* to programmers ???
Other issues : ================
Several other attributes can be attached to communicators in a similar way :
- time-out for operations - guaranteed bandwidth / latency-for-zero-message - maxlength for message / guaranteed bufferspace available