Re: alternate simplified proposal #1 (adding extra comm argument to every call)

Richard Frost (frost@sdsc.edu)
Wed, 19 Jun 1996 09:43:28 -0700 (PDT)

Hi Parkson,

Although I would like to support the split of a communication group
after an open, you have made some reasonable arguments to the contrary.
I can see how to overcome them, but perhaps this is asking too much
of implementors.

My alternative is to drop the extra comm argument from these arguments.
Instead, we ask the user to 1st split their communicator, then perform
multiple open statements.

- Richard

---------- Forwarded message ----------
Date: Fri, 14 Jun 1996 09:16:30 -0700 (PDT)
From: Parkson Wong <parkson@nas.nasa.gov>
To: frost richard <frost@SDSC.EDU>
Cc: mpi-io@mcs.anl.gov
Subject: Re: alternate simplified proposal #1 (adding extra comm argument to every call)

In the proposal, comm is added as an argument to all the read, write,
close and fcontrol call.

The given argument for this is the added flexibility of allowing a
subset of the process from the original comm group that did the open
to perform the I/O operation. The user does not have to remember
the layout when he decided a subset of the processes will do the I/O.
With a new open, he need to do all the layout calls again. That is
what I could come up with, may be Richard should speak for himself.

My counter argument are:
1) an extra argument on every call (users are already complaining
about the number of arguments)

2) Close and fcontrol with a different comm group is difficult to
implement and semantics hard to define. This is orthogonal to
the read/write and should be discussed seperately.
Say, process 1 2 3 4 opened the file, a subset of 1 and 2 closed
the file. Is the subset 2 3 4 still allow to access the file?
Or only the subset of 3 and 4 is allowed to access the file?
This make the checking of what comm group is allowed to access
the file a lot more complicated.

I really question the usefulness of this. Why do a user want to
do a close for a subset of the processes? He could always wait
till everybody is done, and do a close at the end.

3) Implementation difficulties (read/write):
a) The orignal comm group passed in opened need to be cached
to enforce that the comm group passed in the read/write
call contains a subset of the processes.

MPI_Comm_compare only returns MPI_IDENT, MPI_CONGRUENT,
MPI_SIMILAR and MPI_UNEQUAL. It won't tell me that a comm
` group is a subset of another. This function need to be
enhanced to return MPI_SUBSET and MPI_SUPERSET otherwise
MPI-IO is not layerable.

This could be an expansive operation because it requires
global communication. Actually I don't know how it could
work since the other processes not in the subset is not
participating.

This will slow down every read/write call.

b) In order to do optimization, I will need to cache the comm
in the last read/write, and compare them. If they are the
same, I could use the old infrastructre, if it isn't, rebuild
the infrastructure. The collective buffering algorithm need
to know who is participating, what the file layouts are, etc.
If the comm group changes, all these things changed.

So, in the general case that the comm group remains the same, we are
paying for 1) a comparison of the original comm group and the new comm
group. 2) a comparision of the new comm group and the last comm gorup
used. All the extra logic to deal with the fact that the comm group
could change.

Once upon a time, Sam half joking proposed that fcntl should allow
changing the comm group. May be it is not such a bad idea after all.

--
Parkson Wong			Address: Numerical Aerodynamic Simulation
MRJ, Inc.				 NASA Ames Research Center M/S 258-6
Supercomputer Applications Segment	 Moffett Field, CA  94035-1000
e-mail: parkson@nas.nasa.gov	Phone: (415)604-3988	Fax: (415)966-8669