Embedded systems are characterized by their sensitivity to cost ($, weight,
power, size etc.). Generalized, cache coherent distributed shared memory is
still relatively expensive to implement. But, a restricted shared memory model
can provide useful performance and idiomatic advantages.
The following are some examples of hardware restrictions I have seen:
In some systems, remote memory access are only allowed to part of the local
memory. This memory may be associated closely with the network adaptor. The
network adaptor may not be able to access general program memory. This may
allow a much faster interface to the general memory, as it doesn't have to
deal with access/contention by the network adaptor.
In some systems, remote memory accesses are not performed coherently with the
remote processors cache. This eliminates the need for cache coherence
hardware. Memory which is shared must be marked explicitly as non-cacheable,
or the cache must be flushed before remote accesses.
In some systems, special "shared" memory may available. For example, a shared
memory card on a VME bus. Local processor memory is not accessible.
Because of these types of restrictions, the OS may provide a restricted shared
memory model (like System V). For example, on the Mercury RACE architecture,
the OS provides a system call to create, and attach to a "Shared Memory Buffer
(SMB)". This memory is provided by the system, and a pointer is returned to
the application. An SMB allows remote get/put operations, however, the target
cannot be an arbitrary memory location.
These may seem like archaic systems to many of you. However, they are common
in the embedded world. I certainly see the current movement away from such
restricted systems in the HPC world. It is typically harder to program a
system in which the different types of memory are exposed, and the result may
be a non-portable program. So perhaps MPI-2 should be forward looking, and
provide abstractions which will work well on the latest and next generation of
As Eric pointed out, the idea of MPI is to provide application porability. It
must (neccessarily) be restricted to those abstractions which can be
reasonably implemented on the widest variety of machines. I.e, its a subset,
not a superset.
The existance of an MPI_SHMALLOC function would probably allow designers of
embedded systems to work around their problems. But I remain concerned about
peformance portability (i.e., a set of abstractions which don't exhibit
performance inversions on different architectures), unless suitable wording is
clearly visible in the standard.
Hughes Aircraft Co.