>> To: mpi-core@mcs.anl.gov
>> Subject: Re: non-blocking ops vs. threads
>> Mime-Version: 1.0
>> Content-Type: text/plain; charset=us-ascii
>> Sender: owner-mpi-core@mcs.anl.gov
>> Precedence: bulk
>>
>> On Jul 11, 8:03pm, Eric Salo wrote:
>> > Subject: Re: non-blocking ops vs. threads
>> > > We at PNL need remote data access.
>> > >
>> > > So how about it, developers -- what is the cost of including
>> > > remote data access, and if it's high, then why, and how might it
>> > > be reduced?
>>
>> One issue that can be difficult is maintaining the coherence of the cache on a
>> processing node. In other words, when data gets "put" in your local memory,
>> you will have to flush any associated cache lines.
>>
This is true only if the remote "put agent" is not cache coherent with the local processor(s).
While this may
be the case for a Cray T3D, it is unlikely to be the case on most present/future systems.
If put is implemented by software executed by the computation processor, the problem does not occur.
If put is implemented by software in a cache coherent communication coprocessor (Meiko, Paragon) the
problem does not occur. If put is implemented by a local controller with a cache-coherent DMA engine,
the problem does not occur.
>> This also implies that you will have to implement a memory consistency model
>> across the members of the communicator, with some sort of barrier to keep
>> everyone in sync.
No, one does not need global memory coherence across all members of the group. One only needs the
local agent that implements the put/get (main processor, communication coprocessor, DMA engine,...) to
be coherent with the local computation processor.
>>
>> > At the last MPI Forum meeting, I made the suggestion that since "puts" are
>> > often so much messier than "gets", perhaps we ought to consider only adding
>> > the "gets" to the standard and just hope that would be sufficient for most
>> > real user needs.
In a machine where the remote put/get engine is not cache coherent get is also a problem, since one needs the
remote processor to flush its cache before the get is executed. But the remote processor is not
supposed to be aware that a get is executed!
>>
>> Puts are required for active messages, in which many people in this group
>> (myself included) are very interested.
I don't understand this statement. I know a fair number of systems that support active messages
but do not support puts, and vice-versa. In any case, the issue is not what are the basic
communication mechanisms on the underlying machine, the issue is what is made visible by MPI to the
user. We can decide to provide an active message interface, we can decide to provide a put/get
interface, we can decide to provide both. Implementers may, at their convenience, use one to
implemente the other.
>>
>> > P.S. Remote data access on our machines is dirt cheap. :-)
>>
>> So how exactly do you build a bus-based share-memory machine with more than 24
>> processors? :)
>>
>> Robert George
>> Army Research Laboratory
>>
>>
------- End of Unsent Draft