149. Communicator Constructors

PreviousUpNext
Up: Communicator Management Next: Communicator Destructors Previous: Communicator Accessors

The following are collective functions that are invoked by all processes in the group or groups associated with comm, with the exception of MPI_COMM_CREATE_GROUP, which is invoked only by the processes in the group of the new communicator being constructed.


Rationale.

Note that there is a chicken-and-egg aspect to MPI in that a communicator is needed to create a new communicator. The base communicator for all MPI communicators is predefined outside of MPI, and is MPI_COMM_WORLD. This model was arrived at after considerable debate, and was chosen to increase ``safety'' of programs written in MPI. ( End of rationale.)
This chapter presents the following communicator construction routines: MPI_COMM_CREATE, MPI_COMM_DUP, MPI_COMM_IDUP, MPI_COMM_DUP_WITH_INFO, and MPI_COMM_SPLIT can be used to create both intracommunicators and intercommunicators; MPI_COMM_CREATE_GROUP and MPI_INTERCOMM_MERGE (see Section Inter-communicator Operations ) can be used to create intracommunicators; and MPI_INTERCOMM_CREATE (see Section Inter-communicator Operations ) can be used to create intercommunicators.

An intracommunicator involves a single group while an intercommunicator involves two groups. Where the following discussions address intercommunicator semantics, the two groups in an intercommunicator are called the left and right groups. A process in an intercommunicator is a member of either the left or the right group. From the point of view of that process, the group that the process is a member of is called the local group; the other group (relative to that process) is the remote group. The left and right group labels give us a way to describe the two groups in an intercommunicator that is not relative to any particular process (as the local and remote groups are).

MPI_COMM_DUP(comm, newcomm)
IN commcommunicator (handle)
OUT newcommcopy of comm (handle)

int MPI_Comm_dup(MPI_Comm comm, MPI_Comm *newcomm)

MPI_Comm_dup(comm, newcomm, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Comm), INTENT(OUT) :: newcomm
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_COMM_DUP(COMM, NEWCOMM, IERROR)
INTEGER COMM, NEWCOMM, IERROR

MPI_COMM_DUP duplicates the existing communicator comm with associated key values, topology information, and info hints. For each key value, the respective copy callback function determines the attribute value associated with this key in the new communicator; one particular action that a copy callback may take is to delete the attribute from the new communicator. Returns in newcomm a new communicator with the same group or groups, same topology, same info hints, any copied cached information, but a new context (see Section Functionality ).


Advice to users.

This operation is used to provide a parallel library with a duplicate communication space that has the same properties as the original communicator. This includes any attributes (see below), topologies (see Chapter Process Topologies ), and associated info hints (see Section Communicator Info ). This call is valid even if there are pending point-to-point communications involving the communicator comm. A typical call might involve a MPI_COMM_DUP at the beginning of the parallel call, and an MPI_COMM_FREE of that duplicated communicator at the end of the call. Other models of communicator management are also possible.

This call applies to both intra- and inter-communicators. ( End of advice to users.)

Advice to implementors.

One need not actually copy the group information, but only add a new reference and increment the reference count. Copy on write can be used for the cached information. ( End of advice to implementors.)

MPI_COMM_DUP_WITH_INFO(comm, info, newcomm)
IN commcommunicator (handle)
IN infoinfo object (handle)
OUT newcommcopy of comm (handle)

int MPI_Comm_dup_with_info(MPI_Comm comm, MPI_Info info, MPI_Comm *newcomm)

MPI_Comm_dup_with_info(comm, info, newcomm, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Info), INTENT(IN) :: info
TYPE(MPI_Comm), INTENT(OUT) :: newcomm
INTEGER, OPTIONAL, INTENT(OUT) :: ierror

MPI_COMM_DUP_WITH_INFO(COMM, INFO, NEWCOMM, IERROR)
INTEGER COMM, INFO, NEWCOMM, IERROR

MPI_COMM_DUP_WITH_INFO behaves exactly as MPI_COMM_DUP except that the info hints associated with the communicator comm are not duplicated in newcomm. The hints provided by the argument info are associated with the output communicator newcomm instead.


Rationale.

It is expected that some hints will only be valid at communicator creation time. However, for legacy reasons, most communicator creation calls do not provide an info argument. One may associate info hints with a duplicate of any communicator at creation time through a call to MPI_COMM_DUP_WITH_INFO. ( End of rationale.)

MPI_COMM_IDUP(comm, newcomm, request)
IN commcommunicator (handle)
OUT newcommcopy of comm (handle)
OUT requestcommunication request (handle)

int MPI_Comm_idup(MPI_Comm comm, MPI_Comm *newcomm, MPI_Request *request)

MPI_Comm_idup(comm, newcomm, request, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Comm), INTENT(OUT), ASYNCHRONOUS :: newcomm
TYPE(MPI_Request), INTENT(OUT) :: request
INTEGER, OPTIONAL, INTENT(OUT) :: ierror

MPI_COMM_IDUP(COMM, NEWCOMM, REQUEST, IERROR)
INTEGER COMM, NEWCOMM, REQUEST, IERROR

MPI_COMM_IDUP is a nonblocking variant of MPI_COMM_DUP. The semantics of MPI_COMM_IDUP are as if MPI_COMM_DUP was executed at the time that MPI_COMM_IDUP is called. For example, attributes changed after MPI_COMM_IDUP will not be copied to the new communicator. All restrictions and assumptions for nonblocking collective operations (see Section Nonblocking Collective Operations ) apply to MPI_COMM_IDUP and the returned request.

It is erroneous to use the communicator newcomm as an input argument to other MPI functions before the MPI_COMM_IDUP operation completes.


Rationale.

This functionality is crucial for the development of purely nonblocking libraries (see [36]). ( End of rationale.)

MPI_COMM_CREATE(comm, group, newcomm)
IN commcommunicator (handle)
IN groupgroup, which is a subset of the group of comm (handle)
OUT newcommnew communicator (handle)

int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *newcomm)

MPI_Comm_create(comm, group, newcomm, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Group), INTENT(IN) :: group
TYPE(MPI_Comm), INTENT(OUT) :: newcomm
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_COMM_CREATE(COMM, GROUP, NEWCOMM, IERROR)
INTEGER COMM, GROUP, NEWCOMM, IERROR

If comm is an intracommunicator, this function returns a new communicator newcomm with communication group defined by the group argument. No cached information propagates from comm to newcomm. Each process must call MPI_COMM_CREATE with a group argument that is a subgroup of the group associated with comm; this could be MPI_GROUP_EMPTY. The processes may specify different values for the group argument. If a process calls with a non-empty group then all processes in that group must call the function with the same group as argument, that is the same processes in the same order. Otherwise, the call is erroneous. This implies that the set of groups specified across the processes must be disjoint. If the calling process is a member of the group given as group argument, then newcomm is a communicator with group as its associated group. In the case that a process calls with a group to which it does not belong, e.g., MPI_GROUP_EMPTY, then MPI_COMM_NULL is returned as newcomm. The function is collective and must be called by all processes in the group of comm.


Rationale.

The interface supports the original mechanism from MPI-1.1, which required the same group in all processes of comm. It was extended in MPI-2.2 to allow the use of disjoint subgroups in order to allow implementations to eliminate unnecessary communication that MPI_COMM_SPLIT would incur when the user already knows the membership of the disjoint subgroups. ( End of rationale.)

Rationale.

The requirement that the entire group of comm participate in the call stems from the following considerations:


( End of rationale.)

Advice to users.

MPI_COMM_CREATE provides a means to subset a group of processes for the purpose of separate MIMD computation, with separate communication space. newcomm, which emerges from MPI_COMM_CREATE, can be used in subsequent calls to MPI_COMM_CREATE (or other communicator constructors) to further subdivide a computation into parallel sub-computations. A more general service is provided by MPI_COMM_SPLIT, below. ( End of advice to users.)

Advice to implementors.

When calling MPI_COMM_DUP, all processes call with the same group (the group associated with the communicator). When calling MPI_COMM_CREATE, the processes provide the same group or disjoint subgroups. For both calls, it is theoretically possible to agree on a group-wide unique context with no communication. However, local execution of these functions requires use of a larger context name space and reduces error checking. Implementations may strike various compromises between these conflicting goals, such as bulk allocation of multiple contexts in one collective operation.

Important: If new communicators are created without synchronizing the processes involved then the communication system must be able to cope with messages arriving in a context that has not yet been allocated at the receiving process. ( End of advice to implementors.)
If comm is an intercommunicator, then the output communicator is also an intercommunicator where the local group consists only of those processes contained in group (see Figure 14 ). The group argument should only contain those processes in the local group of the input intercommunicator that are to be a part of newcomm. All processes in the same local group of comm must specify the same value for group, i.e., the same members in the same order. If either group does not specify at least one process in the local group of the intercommunicator, or if the calling process is not included in the group, MPI_COMM_NULL is returned.


Rationale.

In the case where either the left or right group is empty, a null communicator is returned instead of an intercommunicator with MPI_GROUP_EMPTY because the side with the empty group must return MPI_COMM_NULL. ( End of rationale.)

Image file


Figure 14: Intercommunicator creation using  MPI_COMM_CREATE extended to intercommunicators. The input groups are those in the grey circle.


Example The following example illustrates how the first node in the left side of an intercommunicator could be joined with all members on the right side of an intercommunicator to form a new intercommunicator.

        MPI_Comm  inter_comm, new_inter_comm; 
        MPI_Group local_group, group; 
        int       rank = 0; /* rank on left side to include in  
                               new inter-comm */ 
 
        /* Construct the original intercommunicator: "inter_comm" */ 
        ... 
 
        /* Construct the group of processes to be in new  
           intercommunicator */ 
        if (/* I'm on the left side of the intercommunicator */) { 
          MPI_Comm_group ( inter_comm, &local_group ); 
          MPI_Group_incl ( local_group, 1, &rank, &group ); 
          MPI_Group_free ( &local_group ); 
        } 
        else  
          MPI_Comm_group ( inter_comm, &group ); 
 
        MPI_Comm_create ( inter_comm, group, &new_inter_comm ); 
        MPI_Group_free( &group ); 

MPI_COMM_CREATE_GROUP(comm, group, tag, newcomm)
IN commintracommunicator (handle)
IN groupgroup, which is a subset of the group of comm (handle)
IN tagtag (integer)
OUT newcommnew communicator (handle)

int MPI_Comm_create_group(MPI_Comm comm, MPI_Group group, int tag, MPI_Comm *newcomm)

MPI_Comm_create_group(comm, group, tag, newcomm, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
TYPE(MPI_Group), INTENT(IN) :: group
INTEGER, INTENT(IN) :: tag
TYPE(MPI_Comm), INTENT(OUT) :: newcomm
INTEGER, OPTIONAL, INTENT(OUT) :: ierror

MPI_COMM_CREATE_GROUP(COMM, GROUP, TAG, NEWCOMM, IERROR)
INTEGER COMM, GROUP, TAG, NEWCOMM, IERROR

MPI_COMM_CREATE_GROUP is similar to MPI_COMM_CREATE; however, MPI_COMM_CREATE must be called by all processes in the group of comm, whereas MPI_COMM_CREATE_GROUP must be called by all processes in group, which is a subgroup of the group of comm. In addition, MPI_COMM_CREATE_GROUP requires that comm is an intracommunicator. MPI_COMM_CREATE_GROUP returns a new intracommunicator, newcomm, for which the group argument defines the communication group. No cached information propagates from comm to newcomm. Each process must provide a group argument that is a subgroup of the group associated with comm; this could be MPI_GROUP_EMPTY. If a non-empty group is specified, then all processes in that group must call the function, and each of these processes must provide the same arguments, including a group that contains the same members with the same ordering. Otherwise the call is erroneous. If the calling process is a member of the group given as the group argument, then newcomm is a communicator with group as its associated group. If the calling process is not a member of group, e.g., group is MPI_GROUP_EMPTY, then the call is a local operation and MPI_COMM_NULL is returned as newcomm.


Rationale.

Functionality similar to MPI_COMM_CREATE_GROUP can be implemented through repeated MPI_INTERCOMM_CREATE and MPI_INTERCOMM_MERGE calls that start with the MPI_COMM_SELF communicators at each process in group and build up an intracommunicator with group group [16]. Such an algorithm requires the creation of many intermediate communicators; MPI_COMM_CREATE_GROUP can provide a more efficient implementation that avoids this overhead. ( End of rationale.)

Advice to users.

An intercommunicator can be created collectively over processes in the union of the local and remote groups by creating the local communicator using MPI_COMM_CREATE_GROUP and using that communicator as the local communicator argument to MPI_INTERCOMM_CREATE. ( End of advice to users.)
The tag argument does not conflict with tags used in point-to-point communication and is not permitted to be a wildcard. If multiple threads at a given process perform concurrent MPI_COMM_CREATE_GROUP operations, the user must distinguish these operations by providing different tag or comm arguments.


Advice to users.

MPI_COMM_CREATE may provide lower overhead than MPI_COMM_CREATE_GROUP because it can take advantage of collective communication on comm when constructing newcomm. ( End of advice to users.)

MPI_COMM_SPLIT(comm, color, key, newcomm)
IN commcommunicator (handle)
IN colorcontrol of subset assignment (integer)
IN key control of rank assigment (integer)
OUT newcomm new communicator (handle)

int MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm)

MPI_Comm_split(comm, color, key, newcomm, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
INTEGER, INTENT(IN) :: color, key
TYPE(MPI_Comm), INTENT(OUT) :: newcomm
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_COMM_SPLIT(COMM, COLOR, KEY, NEWCOMM, IERROR)
INTEGER COMM, COLOR, KEY, NEWCOMM, IERROR

This function partitions the group associated with comm into disjoint subgroups, one for each value of color. Each subgroup contains all processes of the same color. Within each subgroup, the processes are ranked in the order defined by the value of the argument key, with ties broken according to their rank in the old group. A new communicator is created for each subgroup and returned in newcomm. A process may supply the color value MPI_UNDEFINED, in which case newcomm returns MPI_COMM_NULL. This is a collective call, but each process is permitted to provide different values for color and key.

With an intracommunicator comm, a call to MPI_COMM_CREATE(comm, group, newcomm) is equivalent to a call to MPI_COMM_SPLIT(comm, color, key, newcomm), where processes that are members of their group argument provide color~=~number of the group (based on a unique numbering of all disjoint groups) and key~=~rank in group, and all processes that are not members of their group argument provide color~=~ MPI_UNDEFINED.

The value of color must be non-negative or MPI_UNDEFINED.


Advice to users.

This is an extremely powerful mechanism for dividing a single communicating group of processes into k subgroups, with k chosen implicitly by the user (by the number of colors asserted over all the processes). Each resulting communicator will be non-overlapping. Such a division could be useful for defining a hierarchy of computations, such as for multigrid, or linear algebra. For intracommunicators, MPI_COMM_SPLIT provides similar capability as MPI_COMM_CREATE to split a communicating group into disjoint subgroups. MPI_COMM_SPLIT is useful when some processes do not have complete information of the other members in their group, but all processes know (the color of) the group to which they belong. In this case, the MPI implementation discovers the other group members via communication. MPI_COMM_CREATE is useful when all processes have complete information of the members of their group. In this case, MPI can avoid the extra communication required to discover group membership. MPI_COMM_CREATE_GROUP is useful when all processes in a given group have complete information of the members of their group and synchronization with processes outside the group can be avoided.

Multiple calls to MPI_COMM_SPLIT can be used to overcome the requirement that any call have no overlap of the resulting communicators (each process is of only one color per call). In this way, multiple overlapping communication structures can be created. Creative use of the color and key in such splitting operations is encouraged.

Note that, for a fixed color, the keys need not be unique. It is MPI_COMM_SPLIT's responsibility to sort processes in ascending order according to this key, and to break ties in a consistent way. If all the keys are specified in the same way, then all the processes in a given color will have the relative rank order as they did in their parent group.

Essentially, making the key value zero for all processes of a given color means that one does not really care about the rank-order of the processes in the new communicator. ( End of advice to users.)

Rationale.

color is restricted to be non-negative, so as not to confict with the value assigned to MPI_UNDEFINED. ( End of rationale.)
The result of MPI_COMM_SPLIT on an intercommunicator is that those processes on the left with the same color as those processes on the right combine to create a new intercommunicator. The key argument describes the relative rank of processes on each side of the intercommunicator (see Figure 15 ). For those colors that are specified only on one side of the intercommunicator, MPI_COMM_NULL is returned. MPI_COMM_NULL is also returned to those processes that specify MPI_UNDEFINED as the color.
Advice to users.

For intercommunicators, MPI_COMM_SPLIT is more general than MPI_COMM_CREATE. A single call to MPI_COMM_SPLIT can create a set of disjoint intercommunicators, while a call to MPI_COMM_CREATE creates only one. ( End of advice to users.)

Image file


Figure 15: Intercommunicator construction achieved by splitting an existing intercommunicator with  MPI_COMM_SPLIT extended to intercommunicators.


Example(Parallel client-server model). The following client code illustrates how clients on the left side of an intercommunicator could be assigned to a single server from a pool of servers on the right side of an intercommunicator.

        /* Client code */ 
        MPI_Comm  multiple_server_comm; 
        MPI_Comm  single_server_comm; 
        int       color, rank, num_servers; 
         
        /* Create intercommunicator with clients and servers:  
           multiple_server_comm */ 
        ... 
         
        /* Find out the number of servers available */ 
        MPI_Comm_remote_size ( multiple_server_comm, &num_servers ); 
         
        /* Determine my color */ 
        MPI_Comm_rank ( multiple_server_comm, &rank ); 
        color = rank % num_servers; 
         
        /* Split the intercommunicator */ 
        MPI_Comm_split ( multiple_server_comm, color, rank,  
                         &single_server_comm ); 
The following is the corresponding server code:
        /* Server code */ 
        MPI_Comm  multiple_client_comm; 
        MPI_Comm  single_server_comm; 
        int       rank; 
 
        /* Create intercommunicator with clients and servers:  
           multiple_client_comm */ 
        ... 
         
        /* Split the intercommunicator for a single server per group 
           of clients */ 
        MPI_Comm_rank ( multiple_client_comm, &rank ); 
        MPI_Comm_split ( multiple_client_comm, rank, 0,  
                         &single_server_comm );   

MPI_COMM_SPLIT_TYPE(comm, split_type, key, info, newcomm)
IN commcommunicator (handle)
IN split_typetype of processes to be grouped together (integer)
IN keycontrol of rank assignment (integer)
IN infoinfo argument (handle)
OUT newcomm new communicator (handle)

int MPI_Comm_split_type(MPI_Comm comm, int split_type, int key, MPI_Info info, MPI_Comm *newcomm)

MPI_Comm_split_type(comm, split_type, key, info, newcomm, ierror)
TYPE(MPI_Comm), INTENT(IN) :: comm
INTEGER, INTENT(IN) :: split_type, key
TYPE(MPI_Info), INTENT(IN) :: info
TYPE(MPI_Comm), INTENT(OUT) :: newcomm
INTEGER, OPTIONAL, INTENT(OUT) :: ierror
MPI_COMM_SPLIT_TYPE(COMM, SPLIT_TYPE, KEY, INFO, NEWCOMM, IERROR)
INTEGER COMM, SPLIT_TYPE, KEY, INFO, NEWCOMM, IERROR

This function partitions the group associated with comm into disjoint subgroups, based on the type specified by split_type. Each subgroup contains all processes of the same type. Within each subgroup, the processes are ranked in the order defined by the value of the argument key, with ties broken according to their rank in the old group. A new communicator is created for each subgroup and returned in newcomm. This is a collective call; all processes must provide the same split_type, but each process is permitted to provide different values for key. An exception to this rule is that a process may supply the type value MPI_UNDEFINED, in which case newcomm returns MPI_COMM_NULL.

The following type is predefined by MPI:

MPI_COMM_TYPE_SHARED --- this type splits the communicator into subcommunicators, each of which can create a shared memory region.



Advice to implementors.

Implementations can define their own types, or use the info argument, to assist in creating communicators that help expose platform-specific information to the application. ( End of advice to implementors.)


PreviousUpNext
Up: Communicator Management Next: Communicator Destructors Previous: Communicator Accessors


Return to MPI-3.1 Standard Index
Return to MPI Forum Home Page

(Unofficial) MPI-3.1 of June 4, 2015
HTML Generated on June 4, 2015