Distributed/heterogeous computing and MPI

Thomas M. Eidson (teidson@htc-tech.com)
Fri, 29 Dec 1995 14:37:54 -0500

To: MPI Forum
From: Tom Eidson
Date: December 28, 1995

At the MPI Meeting at Supercomputer '95, I expressed concern that the
MPI2 proposal would not be particularly useful for distributed,
heterogeneous programming. I understand some of the reasons why the
MPI2 proposal is limited, but I also have some confustion. The purpose
of this note is to elicit some feedback. I will try to briefly list
the requirements that I feel would be useful in developing distributed
computing systems. These are based on several projects where I have
build such systems (using PVM and my own socket-based libraries).

While I am presenting these suggestions to the MPI Forum, it is not
obvious that the MPI standard is the proper place for the requirements
that I believe are needed. The problem is that other people seem to
believe that MPI2 will provide the needed functionality for message
passing in heterogeneous environments. Heterogeneous computing is
being repeated tauted as important to the future. Standards would seem
more imporatant in this arena, but it appears that little is being
done. Therefore, it should be decided and made clear whether the MPI
standard will remained limited to basic message passing or will it be
extended to include all distributed computing requirments.

Needs of Distributed Computing in Heterogeneous Environments
------------------------------------------------------------

1. Ability to set up a virtual network of computers in a
reasonably portable, flexible and straightforward-to-program manner.
(machine control)

2. Process control primitives that allow flexible control
of remote executables.

3. Task control primitives that are synergetic with
good programming styles.

4. Ability to communicate between processes that are loaded with
MPI (or other communication) libraries developed by different
vendors.

Discussion
----------

The ability to set up a virtual network of processes in a dynamic
manner seems to be the primary goal of the MPI2 Dynamic Process
proposal. The proposal leaves it to the user to determine which
machines are available, file system details, executable locations,
priority and other details envolved in using a heterogeneous
computing environement. This obviously provides no portability
assistance to the user and requires each user to write his own
computer and process management utilities.

In my discussions with MPI Forum participants over the past months,
I have been given various reasons for why including such detail
in a proposal is difficult. It seems that the primary concern is that
since an MPI function would have to interact with the operating system
and since operating systems are so diverse, a good standard that
meet all needs would be almost impossible to put together. In general,
I am not convinced. All MPI functionality could be viewed in this
manner. In the following, I will try to outline some ideas that
allow the user to express appropriate information to an MPI-based
library in a reasonalbe manner.

I think that the basic difficulty in previous proposals (and even
with PVM) is that too much functionality is packed into the "spawn"
function. By separating machine, process, and task control, I believe
the generation of a good standard will be more straightforward.

i)Machine control would involve the generation of a pool of machines
to select from, the testing or requesting machine availibility, the
determination of machine status, the determination of file system
structure, the determination of various machine properties, etc.
I assume that the length of the above list is one of the concerns.
It seems clear that a "standard function" would be difficult to
define with arguments for all the above information. It acutally
would be even more difficult since other items could be added.

A reasonable proposal however could be based on the following
ideas. The "core virtual machine definition function" would only
return a list of id's to the potentially available computers and
their availability status if requested. This list would include
machines associated with the initial job (those already in
"MPI_WORLD") as well as machines read from a user-supplied and/or a
system-supplied configuration file. The communication library
would read the configuration files.

These configuration files would provide core information as well as
optional information that might be specific to different
computer/OS architectures. The semantics of the configuration
files would be a list of "keyword equal value" entries associated
with each computer or group of computers. Associated commands
would be needed to extract and input the value associated with a
keyword from the library.

ii)Process control would involve starting, stopping and checking the
status of any remote processes. Information needed from the user
are executable name and argument list, execution directory
(relative to a base directory which is machine specific), directory
where executable is located, and the machine id (or id's). A
process could be a normal executable or a task server (see below).

iii)I am distinguishing task control from process control as follows.
Frequently, it is convenient to run the same segment of code(or tasks)
repeatedly or different tasks on the same data set. One does not
wish to start and restart each task instance as a new process
because of the overhead considerations, espcially when large data
sets are involved. It is thus convenient to program a remote
process as a task server for which messages can be sent to request
the execution of a tasks. The task request functions can be
modeled after subroutine calls. A communication library could
contain task request functions that make tasks requests to a "main
program stubs" in a remote process and also send/receive
arguments. A user would load his subroutines with a stub supplied
as part of a communication library. Thus the user writes a host
program or programs that contain task calls along with subroutine
segments which are loaded with the library stubs. This would make
it much easier for many users to generate distributed codes.

iv)As distributed, heterogenous computing increases, more diverse
computer/network architectures will be encountered. Already, a
simple host/node(SPMD) program can make use of ethernet/sockets
coupled with a high-performance vendor network/protocal. For
a message passing standard to be useful in these environments,
it is necessary for a method of communicating via different
underlying systems for at least some of the nodes in the total
system. A reduced set of functionality would even be acceptable.

The above is not intended to be a complete proposal. My intent is to
express some of the requirements of heteogeneous computing based on
my experience and to suggest an approach as how to package the needed
functionality. I would also like to elicit some feedback as to what are
the problems of incorporating this functionality into a standard.
Futhermore I will reiterate, it is not obvious that MPI is the right
place for the above functionality.

However, if MPI is not the appropriate place then there is a need
to determine where it should be. Efforts such as CORBA could be the
right place, but it is not yet clear (at least to me) if CORBA is
appropriate for those wishing a simple environment for distributed
computing and message passing.