[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [mpi-21] Proposal EH2: add const keyword to the C bindings




On Jan 10, 2008, at 6:37 AM, Dries Kimpe wrote:


* Richard Treumann <treumann@xxxxxxxxxx> [2008-01-09 18:16:29]:
If the "subset of processes within a communicator" is reasonably stable you
can do better by making an additional communicator containing only those
processes and doing MPI_Bcast on this subset communicator. If the subset
keeps changing then making the subset communicator may be too costly
compared to the savings.

Everybody keeps assuming *exactly* the same buffer is sent to the other
ranks; What if you're sending different parts of the same datastructure to
different ranks, but some of the parts overlap?

I have another use case, which has come up within the HDF5 library. We have an API call "H5Dwrite" that accepts a const pointer to a buffer of data elements to write to an HDF5 file. Since the MPI_File_write* calls take non-const pointers to their buffers, I have to cast away the 'constness' of the buffer before passing it to them.


I definitely won't let this single, optional chance that an application wants to write data with MPI (as opposed to other forms of I/O the HDF5 can perform) make me change the buffer parameter to H5Dwrite() to be non-const, but I hate casting away the constness of the pointer when I pass it to MPI.

This may be "wrong" from the MPI standards point of view, but our library is not solely dependent on using MPI for I/O and I want to assert to applications that call H5Dwrite() that we won't modify their buffer when we perform I/O on it. In a sense, I'm carrying up the semantics for the POSIX write() call (which has a const pointer to its buffer). So far, no MPI implementation has bitten me on this... (as Dries says below, etc.)

	Quincey


A common 'pattern' to do this is to create the datatypes, and a mapping
of (rank,datatype). Then do Isend for every rank,datatype and waitall at
the end.


Why not use alltoall/scatter? Well, if every rank is only communicating to
a limited (non-growing with universe size) number of other ranks, using
collectives serializes the transfer. (nonblocking collectives? ;-)


Although I wouldn't have the user program read from a in-use send buffer,
I for sure am a sinner by having implemented the above described scheme.


I coincide with Gregor:
- I've done this, and by consequence am among those (few??) that do write
wrong MPI programs.
- The program ran(and runs) without problems on 3 different architectures
and at least 4 different MPI implementations.
- Not being able to do this forces me to copy every time and reduces
performance


Maybe it is reasonable to allow the case of using the send buffer multiple
times, but not allowing the user program to touch it? This wouldn't
prohibit byte-swapping tricks, and if there is an MPI implementation that
has severe restrictions on send buffers, it might do a little bit extra
bookkeeping (if it doesn't do it already) and handle the case of
overlapping send buffers...


Implementations that don't care (most current implementations it seems)
won't need to do anything special.


 Greetings,
 Dries


Disclaimer: http://www.kuleuven.be/cwis/email_disclaimer.htm