On Wednesday in Sub-committee, I am going to propose that we review the
division of real-time discussions, proposals, and standardization into two
parts: immediate action, and later action, and that fault tolerance in
terms of specific add-on features to the chapter be considered only after
we have working prototypes of MPI/RT in the field, so that we do not
dilute our current efforts. Since we are not under the same time-ending
constraint as MPI-2, this allows us to separate concerns usefully, and get
a working MPI/RT of limited scope. We can still return to this issue this
year, only later this year (eg, 4th quarter).
I am concerned that fault tolerance of any kind totally will diffuse our
effort right now, and as it is clear that fault tolerance and high
performance are often mutually exclusive, we need to think carefully
before jumping off in this direction. Ideally, in the MPI/RT as a
"standard within MPI Standard or Addendum within MPI standard", we can
have multiple chapters, one of which addresses FT, and accepts different
operational assumptions and requirements than the base RT.
I suggest that weak FT issues will be addressed in MPI-2, MPI/RT, but we
need to be careful to address the primary audiences of MPI/RT (hard and
soft real-time systems with expectations of high performance) first.
We are not nearly through with this as yet.
This thinking is analogous to the limitations MPI Forum set on MPI-1,
where certain classes of features were deferred.
-Tony
Anthony Skjellum, PhD, Associate Professor of Computer Science;
Mississippi State University, Department of Computer Science & NSF ERC
Butler, Rm 300, PO Box 9637, Corner of Perry&Barr, Mississippi State,MS 39762
(601)325-8435 FAX: (601)325-8997; http://www.erc.msstate.edu/~tony;
"Persistence is fertile." ; e-mail: tony@cs.msstate.edu; Try MPI!