From: William Saphir <wcs>
Date: Thu, 29 May 1997 12:57:47 -0700
To: Dick Treumann <treumann@kgn.ibm.com>
Subject: Re: comments on first 80 pages
Cc: raja@tbag.rsn.hp.com, salo@sgi.com, wcs@nersc.gov
On May 29, 3:13pm, Dick Treumann wrote:
> Subject: Re: comments on first 80 pages
> Bill Saphir wrote:
> >
> > > 63:3-4 Since MPI states that interaction between MPI and non-MPI
> > > modes of interprocess communication is undefined, can we
> > > imply that MPI_COMM_JOIN is OK between processes which
> > > share a MPI_COMM_WORLD or MPI parent/child relationship?
> > > I think this should say: The processes must not have a
> > > preexisting MPI connection.
> >
> > This was discussed when we were designing the function. I don't
> > remember exactly why, but the intent was that *any* MPI processes
> > could use it, even if they already had an MPI connection.
>
> Do you mean this was discussed in the open Forum or only in the Dynamic
> subcommittee? I do not remember the discussion or the reasoning.
Sorry, I meant in private discussion before the proposal - you
would not have seen this.
>
> > Also, I don't think the fact that the interaction of MPI and
> > non-MPI communication is undefined has any bearing here. Are
> > you thinking of anything specific?
> >
> To me it seems clear that the establishment of a SOCK_STREAM socket
> involves communication which is not MPI communication. Also, the
> communication hidden under the MPI_JOIN call is arguably NOT MPI
> communication. (no communicator, no context). To my mind, MPI_Join's
> legitimacy is quite closely tied to a requirement that there not be
> concurrent MPI communication linking the processes involved. If it is
> legal for tasks A and B to be members of some communicator and to have
> outstanding MPI communications on that communicator at the moment they
> decide to construct a socket connection and do an MPI_Join through it
> then you have accepted the responsibility to define the interation
> between socket communication and MPI communication (including impacts on
> the MPI progress rule).
I understood this, but had convinced myself that since this
was a blocking collective call, there was no issue with
the interaction of MPI/non-MPI communication. What I needed
was a specific example to show the problem, but I've
now thought of one and it's pretty simple.
process 1 process 2
mpi_isend(dest=2)
mpi_recv(src=1)
mpi_join()
mpi_join
mpi_wait()
----------
Unless mpi_join implicitly makes progress the mpi_recv may not
return. And it is easy to imagine a situation where mpi_join
will not be able to make progress (e.g., it is in a blocking
read()).
So I agree we need to say something.
I would like not to make it erroneous for connected processes,
but mention the problem and say the result is undefined for
connected processes. Does that go far enough?
Bill
---End of forwarded mail from "William Saphir" <wcs>