A few thoughts on Chapter 8.
Section 8.2.2
-------------------
A time provider can broadcast its clock value, update a shared
variable known to all its local users, which they can read as their
needs require and readers can access the clock with no delay. [1]
This is another clock definition that may be useful.
Section 8.3 Instrument Events
-------------------------------
Events can be categorized as: [2]
Task events - state changes of an executing task
Run Time System Events - predefined changes in runtime system
Label Events
Transition Events
Event Histories
Synchronous and asynchronous events
Timing violations
Bounded event histories.
and these can be a formal framework for monitoring run time constraints.
[1] Real Time System Design (Levy & Agrawala)
[2] Advances in Real Time Systems, (S. Song)
Section 8.3.4 Fault Tolerance
-------------------------------
- Updates to clock server or synchronization of local time provider can
provide fault tolerance.
- Another Recovery Scheme:
The node which finds the fault notifies all the nodes of the
communicator. Fault detection is by concensus of all the members
(including the parent).
The parent kills the faulty node, etc as described in the spec currently.
Section 8.5.2 Time Driven MPI/RT
----------------------------------
After a timeout has occurred, the details should be left to
the implementor. The purpose of an activity interval is to insure that
the system resources required to satisfy this operation are
available WITHIN the specified interval. This makes the implementation
more flexible and accounts for the case where the system resources
fail and go away before the interval is over, therefore the operation
does not complete.