Arkady
>I agree with Dennis that MPI should provide generic resource bandwidths (or
>quality or a "slice" of service) and then the application or other elements
>should deal with the allocation and scheduling decisions.
>
>I also agree with Kelvin's technical observations on message scheduling
>(can we perhaps include these in the user-defined profiles as we discussed,
>as one typical scenario?).
>
>I also have a minor observation that one of the reasons RMA does not extend
>in the same provable framework to parallel machines is that it is a
>scheduling-theoretic technique for perfect uniprocessors. In that sense, it
>is well-known that proofs do not scale up (provably) to multiple processors
>or any system with at least two blocking resources.
>
>Alex.
>-------------------------------------------------------------------------
>Dr. Alexander D. Stoyen
>President & CEO
>21st Century Systems, Inc.
>420 Hardscrabble Road
>Chappaqua, NY 10514-3030
>(914) 769-2939 (office)
>(914) 769-0949 (fax)
>Email: alex@21csi.com
>
>----------
>> From: Arkady Kanevskyby way of Kelvin Nilsen <kelvin@newmonics.com>
><arkady@linus.mitre.org>
>> To: mpi-realtime@mcs.anl.gov
>> Subject: Re: Mixed-model admission tests
>> Date: Monday, December 16, 1996 3:36 PM
>>
>> Kelvin,
>>
>> These are a very good comemnts.
>> Many of them we had discussed in the past many times.
>> I will provide my feedback to them.
>>
>> The main difficulty comes from the underlying assumptions of the
>> CPU priority scheduling models. Communication that we are "trying"
>> to schedule is not a single resource. For one type of platform,
>distributed
>> shared memory, scheduling may require to schedule: application processor
>> CPU (sender and receiver), data bus (sender and receiver),
>> memory (sender and receiver), communication co-processor (sender and
>> receiver),
>> network interface chip (sender and receiver),
>> underlying physical network that can have various topologies (like
>> ring, mesh, cross bars). The schedulers (if they exist) for all these
>> components of a platform potencially use different technologies that are
>> most suitable for their own design. The RMA is very hard to adopt to
>analyze
>> all these resource that need to be schedule to provide "predictable"
>> real-time behavior for communication. This is probably the reason there
>> is no theoretical work that extended RMA to parallel platforms.
>>
>> The communication scheduling problem need to address scheduling of many
>> heterogeneous resources and scheduling algorithms need to be flexible
>> enough to handle scalability issues that are in the heart of parallel
>> processing. Ever since the beginning of the MPI/RT meetings I advocated
>> the position that the standard should not dictate how to schedule
>> communication.
>> Instead it should provide an API that supports user perspective for
>> timing (and fault-tolerance). Then implementors can map user requests
>> onto the scheduling primitives that their platforms use: priority,
>calendars,
>> events, signals and so on.
>>
>> A simple mapping that Kelvin presents for priority is a good starting
>point
>> and can be included as advice to implementors. But this example
>> also illustrates the difficulties of scheduling communication. The worst
>> case available utilization (WCAU) preseneted only covers single resource
>-
>> CPU.
>> A simple extentions of WCAU to handle multiple resources is very
>pessimistic
>> with very low utilization which most of the high-performance users
>> will not accept. MITRE looked into this problem and decided that for
>> a large class of signal processing applications it is easier to take
>> advantage of time-driven scheduling utilizing synchronized clocks that
>> most high-performance platforms for real-time applications provide.
>> That also allows MPI/RT implementation to utilize multiple routes to
>avoid
>> hot spots at the channel creation time to provide maximal support for
>> real-time users.
>>
>> The MPI/RT implementation also releave users from the necessity of
>figuring
>> out "right" priorities and doing the "scheduling" and its analysis
>> and allows then to concentrate on the "application". However, MPI/RT
>> allows users to "play" with detail scheduling if they so desire.
>> My vision that MPI/RT implementors know a lot more about the platform
>> and hence should be able to provide real-time predictable behavior while
>> maximizing "platform utilization" (not just individual resources like
>CPU)
>> better than the users. MPI/RT API allows to provide application
>information
>> detailed enough for an implementation to "utilize" the platform.
>>
>> Notice that for some platforms, like shared memory, the simple single
>resource
>> model is sufficient and hence Kelvin's suggestion will work well.
>> But in general, to support scalability of applications a more
>comprehensive
>> analysis is needed.
>>
>> Arkady
>>
>> > From mpi-realtime-human@mcs.anl.gov Mon Dec 9 23:52:11 1996
>> > X-Sender: kelvin@newmonics.com
>> > X-Mailer: Windows Eudora Pro Version 3.0 (32)
>> > Date: Mon, 09 Dec 1996 22:40:54 -0600
>> > To: mpi-realtime@mcs.anl.gov
>> > From: Kelvin Nilsen <kelvin@newmonics.com>
>> > Subject: Mixed-model admission tests
>> > Cc: kelvin@newmonics.com
>> > Mime-Version: 1.0
>> > Content-Type> : > text/plain> ; > charset="us-ascii">
>> > Sender: owner-mpi-core@mcs.anl.gov
>> > Content-Length: 7798
>> >
>> >
>> > I attended the MPI/RT meeting on Dec. 5 and 6. This was my first real
>> > involvement with MPI/RT. Since there are a lot of similarities in the
>> > objectives and approaches of MPI/RT and my work with a real-time
>variant of
>> > Java (PERC), I am hopeful that our two groups can learn from each
>other...
>> > (To participate in the real-time-java mailing list, send a "subscribe"
>> > message to real-time-java-request@iastate.edu. To review past
>discussions,
>> > refer to www.newmonics.com:Technologies:Real-Time Java.) I'm going to
>use
>> > the PERC name from this point forward (rather than real-time Java)
>because
>> > I do not have Sun's permission to use their trademarked name in this
>> context.
>> >
>> > Both PERC and MPI/RT propose to accept responsibility for reliable
>> > execution of dynamic real-time workloads. In both environments, a
>workload
>> > is proposed to the system, the system analyzes the proposed workload,
>and
>> > the system either accepts or rejects the workload. Once a workload has
>> > been accepted, real-time constraints are "guaranteed" to be satisfied.
>> >
>> > In comments describing PERC implementation techniques, we point out
>that
>> > real-time threads can be scheduled either through the use of a static
>> > cyclic schedule, or a dynamic static priority schedule, or a hybrid
>between
>> > the two. (Dynamic priority implementations would also be acceptable.)
>The
>> > current MPI/RT draft supports two modes of operation: time-driven and
>> > event-driven. (I'm intentionally ignoring priority-driven for now).
>> >
>> > In the MPI/RT meeting, Arkady suggested that his hope was that we would
>> > come to some sort of agreement as to how each mode of operation should
>be
>> > handled separately at first, and that we would later tackle the problem
>of
>> > mixing them together. I told him individually, and now in public, that
>I'd
>> > like to see some thought given to how the modes would eventually mix.
>(I
>> > have a sense that one reason it may be difficult to reach consensus on
>> > certain design issues is because we don't have a good understanding of
>the
>> > inherent implementation complexities associated with particular API
>> features.)
>> >
>> > These are my thoughts regarding one possible implementation of a
>> > mixed-model admissions test. I welcome feedback and suggestions for
>> > improvement.
>> >
>> > Assumptions:
>> > 1. A static cyclic schedule has already been constructed
>> > 2. The static schedule was constructed so as to minimize the
>durations
>> > of CPU time that are allocated to static tasks. e.g. the static
>> > cyclic scheduler prefers:
>> > _______ ________
>> > ________|___A___|________|____B___|_________
>> > 0 10 20 30 40 50
>> >
>> > over:
>> > _______ ________
>> > ________|___A___|____B___|__________________
>> > 0 10 20 30 40 50
>> >
>> > 3. Our job, given an existing static schedule and a proposed workload
>
>> > of dynamic tasks, is to decide whether to accept responsibility
>for
>> > execution of the dynamic tasks.
>> >
>> > The analysis is based on the techniques used by Liu-Layland to derive
>the
>> > standard RMA result. To illustrate the analysis, consider the
>following
>> > static cyclic schedule, of duration 70
>> > ____ ____ ________
>> > [__A_|____|_B__|_______|____C___|__]
>> > 0 10 20 30 45 65
>> >
>> > 1. At the time the static schedule is constructed, we precompute some
>
>> > tables which characterize the ways in which this static schedule
>> > might interfere with dynamic tasks. In particular, we consider
>that
>> > the Liu-Layland critical instant might occur at time 0, time 20,
>or
>> > time 45 so we build interference graphs for each of those cases:
>> >
>> > Available CPU Time for Critical Point A
>> >
>> > 30 | .
>> > | ____/
>> > 20 | /
>> > | /
>> > 10 | __/
>> > | /
>> > 0 |_/
>> > |_._._._._._._.
>> > 0 20 40 60
>> > 10 30 50 70
>> >
>> > Available CPU Time for Critical Point B
>> >
>> > 30 | .
>> > | /
>> > 20 | __/
>> > | ____/
>> > 10 | /
>> > | /
>> > 0 |_/
>> > |_._._._._._._.
>> > 0 20 40 60
>> > 10 30 50 70
>> >
>> > Available CPU Time for Critical Point C
>> >
>> > 30 | .
>> > | /
>> > 20 | /
>> > | __/
>> > 10 | /
>> > | __/
>> > 0 |___/
>> > |_._._._._._._.
>> > 0 20 40 60
>> > 10 30 50 70
>> >
>> > Now, what we really want to know is the worst-case
>> > interference imposed by the existing static schedule on execution
>> > of dynamic real-time tasks. In this case, the worst-case
>interference
>> > is represented by the third graph. In more complicated cases, it
>is
>> > possible that the worst-case interference graph would need to be
>> > computed by merging the above graphs. The general technique is to
>> > define the function as the minimum of the values of the individual
>> > graphs for each possible value of the x coordinate. Note that
>this
>> > chart can be repeated to allow analysis of dynamic workloads with
>> > task periods that exceed the length of the cyclic schedule.
>> >
>> > Worst-Case Available CPU Time
>> >
>> > 30 | .
>> > | /
>> > 20 | /
>> > | __/
>> > 10 | /
>> > | __/
>> > 0 |___/
>> > |_._._._._._._.
>> > 0 20 40 60
>> > 10 30 50 70
>> >
>> > 2. Note that, in terms of the "Worst-Case Available CPU Time"
>illustrated
>> > above, the dynamic tasks will, at worst, be able to run only
>during
>> > times represented by upward slanting graph segments. Suppose
>> > a certain dynamic workload can be sheduled in Y time in the
>absence of
>> > interference from statically scheduled tasks. On the above graph,
>
>> > find the rightmost x coordinate that maps to the given Y and call
>this
>> > X. The worst-case amount by which this event will be delayed by
>> > interference from statically scheduled tasks is X-Y.
>> >
>> > 3. Sort the proposed dynamic workload in terms of decreasing
>execution
>> > frequency. In order from highest to lowest priority, examine each
>
>> > task:
>> >
>> > a) Compute its "raw" completion time in terms of its WCET and the
>
>> > WCET and frequencies of all higher "priority" tasks.
>> >
>> > If its completion time adjusted for static schedule
>interference
>> > does not satisfy the task's latency requirement, reject the
>> > workload as unschedulable.
>> >
>> > b) Add this task into the "interference" workload that is used to
>
>> > compute completion times for all lower priority tasks.
>> >
>> > Some observations:
>> >
>> > 1. This assumes tasks (message transmissions) can be preempted at
>arbitrary
>> > times. If this is not the case, more detailed bookkeeping is
>required.
>> >
>> > 2. This ignores the overhead of distributed dynamic scheduling. If
>> multiple
>> > nodes desire to transmit dynamic messages during time slots not
>reserved
>> > in the static schedule, there is a certain (in some cases
>> > nondeterministic)
>> > overhead associated with contention resolution protocols.
>> >
>> > 3. This analysis can be easily modified to support deadlies shorter
>than
>> > periods, such as is the case with so-called "emergency messages."
>> >
>> > I'd like discussion and comments on this... I do not propose this as
>the
>> > ultimate solution.
>> >
>> >
>> > --
>> > Kelvin Nilsen, President voice: 515-296-0897
>> > NewMonics Inc. fax: 515-296-9910
>> > 2501 N. Loop Dr., Suite 614A http://www.newmonics.com
>> > Ames, IA 50010 email: kdn@newmonics.com
>> >
>>
>
>