Attendees Thursday:
Rob Bjornson (SCA)
Alan D. Brunelle (Alacron)
Darrell Copeland (SKY)
Zhenqian Cui (MSU)
Richard Games (MITRE)
Robert George (MSU)
Arkady Kanevsky (MITRE, Hosting the meeting)
Clayton Keller (CSPI)
Leonard Monk (MITRE)
Tom Nelson (Mercury)
Anthony Skjellum (MSU)
Ge'rard Vichniac (Mercury)
Attendees Friday:
Robert Babb (DU)
Rob Bjornson (SCA)
Zhenqian Cui (MSU)
Nathan Doss (Sanders)
Richard Games (MITRE)
Robert George (MSU)
Arkady Kanevsky (MITRE)
Clayton Keller (CSPI)
Anna Roenbehler (SKY)
Anthony Skjellum (MSU)
Ge'rard Vichniac (Mercury)
Thursday, Sept. 19
------------------
Quality of service was discussed at length. It was proposed and accepted
to limit quality of service to "application-level" QOS, and to remand
network level QOS to be advice to implementors. As a function of this
decision, Arkady's QOS proposal was promoted in the section on this topic,
and Robert George's proposal was converted into being an advice to
implementors. Application-level QOS was "beefed up" and appears in the
revised chapter (as of 19th Sept.).
The possibility of utilizing a common, low-level time quantum framework
was discussed with regard to unifying the realtime models. This will
be revisited in the future.
Model of realtime were discussed, and specific presentations were offered
by Mercury (Nelson/Vichniac), and Copeland (Sky). This discussion led to
desultory comments about the relationship of the low level services to
implementations of MPI ensued. We agreed to emphasize portable QOS
specifications in the RT specification, and restrict vendor-specific
specifications to be done outside (eg, ACL/Telaris or environment variables).
It was felt that the comprehensive, collective channel interface would
offer a means for the system to utilize underlying resources effectively,
diminishing the need to specify special routes and other one-shot optimizations
common in one-channel initialization schemes, as presently used.
Profiles were accepted, and it was decided that all profiles would be
required of all implementations. These profiles are to include
1) "embedded" (renamed resource constrained on Friday)
2) "embedded" plus RT
3) MPI 1.2 plus RT
4) MPI 2.0 plus RT
We defined the meaning of portability in the MPI/RT context. See chapter.
We refined and modified the initialization and termination concepts for
RT.
Summary of key issues addressed on Thursday:
* Introduction updated
* Clarification of the contents of QOS parameters for
time-driven, event-driven, priority-driven
* All lower-level QOS is merely advice to implementors
* Definition of profiles (4):
Embedded (new clear definition)
Embedded + RT
MPI 1.2 + RT
MPI 2 + RT
* Clarifying meaning of portability of MPI/RT programs.
Thursday Evening.
An updated version of the chapter was prepared and distributed
on the reflector.
Friday, Sept. 20
----------------
Instrumentation was discussed at length. Clarification of the different
"partitions" was made. The following issues were discussed:
* RT monitoring for heuristics vs. profiling
* MPI specification vs. application specification
* impact on benchmarking
* Does instrumentation need to be portable?
* Is the measure of # of missed deadlines provide a useful metric?
* Separation of instrumentation by paradigm
* Support by vendor of certain profiling features
* Should we have a constructor/destructor for monitors - yes
Action item was to think about these issues further for October meeting.
Anna R. will write new instrumention proposal ASAP.
We renamed "embedded" to be "resource constrained." We defined "tiny" as
a synonym for "resource constrained."
Buffers were discussed at length, and significant improvements were made
to the API. The benefits and liabilities of BUFFERS_TEST and BUFFERS_WAIT
was made. We discussed what should be put in strategy field. Should it
be in construction of channels, or part of the buffer object?
We decided that strategy is in two levels: user, and MPI.
We renamed BUFFERS to be a BUFFER POOL. We decided to make the strategy
argument part of the BUFPOOL construction. We decided to utilize one
datatype and count for all buffers in a BUFPOOL, and that the datatype
would be manifestly RELATIVE.
We decided to use a constructor for the BUFPOOL that specifies all of the
buffers at once. We decided to provide a callback on deletion for the
BUFPOOL.
Fault tolerance was discussed. A request for references for the ideas,
esp. the Byzantine General's algorithm, was made.
We decided that the fault tolerant area is a deep, broad area that merits
consideration discussion above and beyond realtime. Specific things we
discussed included:
* Level of checking with respect to : channel down, node down,
timeout, communicator broken
* Error handling vs. fault tolerance
* Operability assessment, fault tolerance, and fault handling
* We discussed fault safe measures for MPI calls. That is, would
we put hooks into an extended API for MPI calls to support fault
tolerant applications.
We confirmed out intention to add timeouts to all MPI calls (ie, comm. type
calls) in the language binding for MPIRT.
We decided to make the fault tolerant work a separate chapter, and address
it later.
Mercury's representative asked that fault tolerance be restricted to
application/user, without support from vendor, insofar as possible.
Channels. We revised the API for building collective channels to avoid
enumerating a lot of empty channels. This has several good side effects:
* It is easier to build the data structures for simple channel
circumstances
* Bi-directional channels can be made with one call
* Channels from a series of calls can be intermixed, and modified
or deleted in subgroups independent of the set in which they
were created
We decided to move the "BUFPOOL" variants of channel initiation to the
channel section, and offer different names for these calls, so that they
are not mutually exclusive with the simpler, non-BUFPOOL variants.
Event models. We discussed event models further, and discovered that more
work is needed in this area. Further work is needed. Persistent
channels without buffers, in order to provide a mechanism for global events,
was discussed, in particular. Event-classes are to be the mechanism for
distributing events over groups.
Soft realtime QOS. We agreed to consider best-effort QOS' as part of a future
meeting.
We discussed the possibility of events external to MPI, and the possibility
of integrating them with MPI event-driven model, such as by providing
an analogy to MPI I/O, or faked inter-communicators.
We reviewed our schedule, over the next months, which we offer as a
separate message.
Submitted: A. Roenbehler, A. Kanevsky, A. Skjellum, Sept. 20, 5:45pm EST