Re: Mixed-model admission tests

Arkady Kanevsky by way of Kelvin Nilsen (arkady@linus.mitre.orgkelvin@newmonics.com)
Mon, 16 Dec 1996 09:36:25 -0600

Kelvin,

These are a very good comemnts.
Many of them we had discussed in the past many times.
I will provide my feedback to them.

The main difficulty comes from the underlying assumptions of the
CPU priority scheduling models. Communication that we are "trying"
to schedule is not a single resource. For one type of platform, distributed
shared memory, scheduling may require to schedule: application processor
CPU (sender and receiver), data bus (sender and receiver),
memory (sender and receiver), communication co-processor (sender and
receiver),
network interface chip (sender and receiver),
underlying physical network that can have various topologies (like
ring, mesh, cross bars). The schedulers (if they exist) for all these
components of a platform potencially use different technologies that are
most suitable for their own design. The RMA is very hard to adopt to analyze
all these resource that need to be schedule to provide "predictable"
real-time behavior for communication. This is probably the reason there
is no theoretical work that extended RMA to parallel platforms.

The communication scheduling problem need to address scheduling of many
heterogeneous resources and scheduling algorithms need to be flexible
enough to handle scalability issues that are in the heart of parallel
processing. Ever since the beginning of the MPI/RT meetings I advocated
the position that the standard should not dictate how to schedule
communication.
Instead it should provide an API that supports user perspective for
timing (and fault-tolerance). Then implementors can map user requests
onto the scheduling primitives that their platforms use: priority, calendars,
events, signals and so on.

A simple mapping that Kelvin presents for priority is a good starting point
and can be included as advice to implementors. But this example
also illustrates the difficulties of scheduling communication. The worst
case available utilization (WCAU) preseneted only covers single resource -
CPU.
A simple extentions of WCAU to handle multiple resources is very pessimistic
with very low utilization which most of the high-performance users
will not accept. MITRE looked into this problem and decided that for
a large class of signal processing applications it is easier to take
advantage of time-driven scheduling utilizing synchronized clocks that
most high-performance platforms for real-time applications provide.
That also allows MPI/RT implementation to utilize multiple routes to avoid
hot spots at the channel creation time to provide maximal support for
real-time users.

The MPI/RT implementation also releave users from the necessity of figuring
out "right" priorities and doing the "scheduling" and its analysis
and allows then to concentrate on the "application". However, MPI/RT
allows users to "play" with detail scheduling if they so desire.
My vision that MPI/RT implementors know a lot more about the platform
and hence should be able to provide real-time predictable behavior while
maximizing "platform utilization" (not just individual resources like CPU)
better than the users. MPI/RT API allows to provide application information
detailed enough for an implementation to "utilize" the platform.

Notice that for some platforms, like shared memory, the simple single resource
model is sufficient and hence Kelvin's suggestion will work well.
But in general, to support scalability of applications a more comprehensive
analysis is needed.

Arkady

> From mpi-realtime-human@mcs.anl.gov Mon Dec 9 23:52:11 1996
> X-Sender: kelvin@newmonics.com
> X-Mailer: Windows Eudora Pro Version 3.0 (32)
> Date: Mon, 09 Dec 1996 22:40:54 -0600
> To: mpi-realtime@mcs.anl.gov
> From: Kelvin Nilsen <kelvin@newmonics.com>
> Subject: Mixed-model admission tests
> Cc: kelvin@newmonics.com
> Mime-Version: 1.0
> Content-Type> : > text/plain> ; > charset="us-ascii">
> Sender: owner-mpi-core@mcs.anl.gov
> Content-Length: 7798
>
>
> I attended the MPI/RT meeting on Dec. 5 and 6. This was my first real
> involvement with MPI/RT. Since there are a lot of similarities in the
> objectives and approaches of MPI/RT and my work with a real-time variant of
> Java (PERC), I am hopeful that our two groups can learn from each other...
> (To participate in the real-time-java mailing list, send a "subscribe"
> message to real-time-java-request@iastate.edu. To review past discussions,
> refer to www.newmonics.com:Technologies:Real-Time Java.) I'm going to use
> the PERC name from this point forward (rather than real-time Java) because
> I do not have Sun's permission to use their trademarked name in this
context.
>
> Both PERC and MPI/RT propose to accept responsibility for reliable
> execution of dynamic real-time workloads. In both environments, a workload
> is proposed to the system, the system analyzes the proposed workload, and
> the system either accepts or rejects the workload. Once a workload has
> been accepted, real-time constraints are "guaranteed" to be satisfied.
>
> In comments describing PERC implementation techniques, we point out that
> real-time threads can be scheduled either through the use of a static
> cyclic schedule, or a dynamic static priority schedule, or a hybrid between
> the two. (Dynamic priority implementations would also be acceptable.) The
> current MPI/RT draft supports two modes of operation: time-driven and
> event-driven. (I'm intentionally ignoring priority-driven for now).
>
> In the MPI/RT meeting, Arkady suggested that his hope was that we would
> come to some sort of agreement as to how each mode of operation should be
> handled separately at first, and that we would later tackle the problem of
> mixing them together. I told him individually, and now in public, that I'd
> like to see some thought given to how the modes would eventually mix. (I
> have a sense that one reason it may be difficult to reach consensus on
> certain design issues is because we don't have a good understanding of the
> inherent implementation complexities associated with particular API
features.)
>
> These are my thoughts regarding one possible implementation of a
> mixed-model admissions test. I welcome feedback and suggestions for
> improvement.
>
> Assumptions:
> 1. A static cyclic schedule has already been constructed
> 2. The static schedule was constructed so as to minimize the durations
> of CPU time that are allocated to static tasks. e.g. the static
> cyclic scheduler prefers:
> _______ ________
> ________|___A___|________|____B___|_________
> 0 10 20 30 40 50
>
> over:
> _______ ________
> ________|___A___|____B___|__________________
> 0 10 20 30 40 50
>
> 3. Our job, given an existing static schedule and a proposed workload
> of dynamic tasks, is to decide whether to accept responsibility for
> execution of the dynamic tasks.
>
> The analysis is based on the techniques used by Liu-Layland to derive the
> standard RMA result. To illustrate the analysis, consider the following
> static cyclic schedule, of duration 70
> ____ ____ ________
> [__A_|____|_B__|_______|____C___|__]
> 0 10 20 30 45 65
>
> 1. At the time the static schedule is constructed, we precompute some
> tables which characterize the ways in which this static schedule
> might interfere with dynamic tasks. In particular, we consider that
> the Liu-Layland critical instant might occur at time 0, time 20, or
> time 45 so we build interference graphs for each of those cases:
>
> Available CPU Time for Critical Point A
>
> 30 | .
> | ____/
> 20 | /
> | /
> 10 | __/
> | /
> 0 |_/
> |_._._._._._._.
> 0 20 40 60
> 10 30 50 70
>
> Available CPU Time for Critical Point B
>
> 30 | .
> | /
> 20 | __/
> | ____/
> 10 | /
> | /
> 0 |_/
> |_._._._._._._.
> 0 20 40 60
> 10 30 50 70
>
> Available CPU Time for Critical Point C
>
> 30 | .
> | /
> 20 | /
> | __/
> 10 | /
> | __/
> 0 |___/
> |_._._._._._._.
> 0 20 40 60
> 10 30 50 70
>
> Now, what we really want to know is the worst-case
> interference imposed by the existing static schedule on execution
> of dynamic real-time tasks. In this case, the worst-case interference
> is represented by the third graph. In more complicated cases, it is
> possible that the worst-case interference graph would need to be
> computed by merging the above graphs. The general technique is to
> define the function as the minimum of the values of the individual
> graphs for each possible value of the x coordinate. Note that this
> chart can be repeated to allow analysis of dynamic workloads with
> task periods that exceed the length of the cyclic schedule.
>
> Worst-Case Available CPU Time
>
> 30 | .
> | /
> 20 | /
> | __/
> 10 | /
> | __/
> 0 |___/
> |_._._._._._._.
> 0 20 40 60
> 10 30 50 70
>
> 2. Note that, in terms of the "Worst-Case Available CPU Time" illustrated
> above, the dynamic tasks will, at worst, be able to run only during
> times represented by upward slanting graph segments. Suppose
> a certain dynamic workload can be sheduled in Y time in the absence of
> interference from statically scheduled tasks. On the above graph,
> find the rightmost x coordinate that maps to the given Y and call this
> X. The worst-case amount by which this event will be delayed by
> interference from statically scheduled tasks is X-Y.
>
> 3. Sort the proposed dynamic workload in terms of decreasing execution
> frequency. In order from highest to lowest priority, examine each
> task:
>
> a) Compute its "raw" completion time in terms of its WCET and the
> WCET and frequencies of all higher "priority" tasks.
>
> If its completion time adjusted for static schedule interference
> does not satisfy the task's latency requirement, reject the
> workload as unschedulable.
>
> b) Add this task into the "interference" workload that is used to
> compute completion times for all lower priority tasks.
>
> Some observations:
>
> 1. This assumes tasks (message transmissions) can be preempted at arbitrary
> times. If this is not the case, more detailed bookkeeping is required.
>
> 2. This ignores the overhead of distributed dynamic scheduling. If
multiple
> nodes desire to transmit dynamic messages during time slots not reserved
> in the static schedule, there is a certain (in some cases
> nondeterministic)
> overhead associated with contention resolution protocols.
>
> 3. This analysis can be easily modified to support deadlies shorter than
> periods, such as is the case with so-called "emergency messages."
>
> I'd like discussion and comments on this... I do not propose this as the
> ultimate solution.
>
>
> --
> Kelvin Nilsen, President voice: 515-296-0897
> NewMonics Inc. fax: 515-296-9910
> 2501 N. Loop Dr., Suite 614A http://www.newmonics.com
> Ames, IA 50010 email: kdn@newmonics.com
>