Meeting Notes

December 15, 2025 - December 17, 2025

Logistics Agenda Presentations Attendance Votes Notes

Notes generated by Zoom AI model

Meeting summary

Quick recap

The MPI Forum meeting focused on reviewing technical documents related to fault handling and session attributes, with discussions around terminology refinement and documentation updates. The group addressed various technical issues including collective operations, status handling, and language binding considerations, while also discussing updates to the MPI ABI implementation and Linux packaging. The conversation ended with a review of a new site document for MPI Tools and discussions about event categorization, followed by confirmation of the next meeting location in New York.

Next steps

  • Joseph: Merge changes to ULFM fault model text and prepare for no-no vote at next meeting
  • Wes: Schedule no-no vote for ULFM PR 816 at next meeting
  • Joseph: Update ULFM PR to add index entries for fault, failure, and error
  • Joseph: Update example in Sessions attributes PR to avoid using data types and use something else for the Hoover
  • Howard: Update Sessions attributes PR to change “certain types of faults” wording and address copy attribute function symmetry
  • Howard: Create side document for ABI SO naming conventions
  • Howard: Address issue about MPI_ABI_VERSION and MPI_ABI_SUBVERSION macros in non-ABI MPI.H
  • Joseph: Bring back non-failed collective discussion as a reading at next meeting after amending shrink and agree text
  • Joseph: Add footnote to Appendix A table distinguishing non-failed collectives
  • Joseph or Howard: Send email to Tony Skjellum and Bill Gropp about status empty contradictions to get historical context
  • Marc-André: Revisit MPI standard to check if MPI_T prefix can be reserved for side documents
  • Marc-André: Rework MPIT event categories based on feedback, clarifying semantics without strict queue requirements
  • Marc-André: Get input from implementation layer experts on event categories and semantics
  • Wes and Martin: Update meeting page with March meeting logistics for New York Summary

MPI Fault Model and Session Attributes

The MPI Forum meeting focused on reviewing two technical documents. First, Joseph presented a reading of the ULFM (User Level Fault Mitigation) fault model description, which defines how MPI implementations should handle process failures. The group discussed and refined terminology around “fail-stop” processes and clarified that implementations may choose to treat certain non-process faults as fail-stop faults. They agreed to have a no-no vote on these changes at the next meeting. Second, Howard presented a reading of a proposal to add delete attribute functionality to sessions, similar to what exists for types, windows, and communicators. The group discussed the technical rationale for this addition, particularly for language bindings that need to manage session state.

MPI Chapter Text Updates

Howard reviewed changes to the MPI chapter, focusing on updates to caching and attribute handling for different MPI objects. He noted that the text needed to be updated to reflect the addition of sessions as a fourth type of MPI object, and discussed the need to remove redundant or confusing language. Victor suggested avoiding the use of certain adjectives and empty phrases in the text. The discussion also covered the need to update similar text for other types of MPI objects, and to ensure consistency in the explanation of functions and their behavior.

MPI Session Attribute Documentation Update

The team discussed the MPI session attribute functionality and identified inconsistencies in the documentation, particularly around error handling. Howard and Christoph clarified that erroneous calls should return error codes rather than failing, and agreed that the documentation needs to be updated to reflect this. Howard planned to review the entire section for consistency and create a pull request with the necessary wording changes, while also addressing similar issues in the language bindings and Fortran sections. The team also noted that someone needs to fix an incorrect entry in the changelog.

ULFM Session Attributes and Terminology

The group discussed two main topics: a session attributes example and non-failed collective operations in ULFM. Howard presented a session attributes example that needed updating to match the new API, and the group agreed to create a separate pull request for general cleanup of attribute-related issues. Joseph proposed adding the term “non-failed collective” to the fault tolerance chapter to address the ambiguity of using the term “collective” for ULFM operations like shrink and agree, where not all processes participate due to failure. The group debated whether a new term was necessary, with Michael suggesting that the concept could be described in the text rather than introducing a new term, and Christoph preferring “non-failed” as it aligns with MPI’s negation conventions.

Fault-Tolerant Collective Operations Standardization

The group discussed how to handle collective operations in fault-tolerant scenarios, particularly regarding the definition and usage of the term “collective” in the standard. They agreed to keep the term “collective” but clarify that it applies to non-failed processes, with Joseph tasked to update the text and bring it back as a reading item. Howard then presented an update on the MPI ABI implementation, highlighting successful cross-implementation testing and discussing concerns about SO name conventions and library versioning, suggesting the need for either explicit standardization or a separate documentation for software engineering and release management considerations.

MPI Documentation and Standard Updates

The group discussed Linux packaging for MPI and the need for a simplified documentation approach. They also debated the Fortran ABI support section in the standard, with Howard proposing to rework it to clarify that it should provide functionality for external Fortran bindings implementation rather than suggesting cross-implementation compatibility. The discussion concluded with a debate about status handling in MPI, where Joseph suggested amending the standard to allow status updates even when returning empty status, while Aurelien questioned the utility of setting an empty status if it’s not read.

MPI Specification Clarification Discussion

The team discussed a contradiction in the MPI specification regarding empty status handling, particularly for functions like MPI_TestAny and MPI_Status_Get. Joseph and Aurelien identified that the current text allows for conflicting interpretations about whether the error field should be set to MPI_SUCCESS for empty statuses. While there was general agreement that the text needs clarification, Martin suggested consulting with Bill and Tony to understand the historical rationale behind the current wording, as this issue dates back to MPI 1.1. The team decided to defer further discussion to the next hybrid meeting to gather more input and potentially make changes to the specification.

MPI Tools Site Document Development

The meeting focused on discussing the development of a new site document for MPI Tools, which is still in progress. Marc-André presented the current state of the document, highlighting the use of MPIT categories to define entity sets and the all-or-nothing implementation approach. The group agreed that this direction was reasonable, though Martin raised a question about the use of the MPI_T prefix for site documents. The team also discussed categorizing events related to message matching and queue operations, with Howard suggesting a simplified approach focusing on whether messages matched posted requests or not. Florent confirmed that the proposed events could be implemented in the Oblivious API. The group decided to rework the event categorization to better align with the intended semantics and agreed that diverging from the Perus specification to create a more general interface was acceptable. The conversation ended with a brief discussion of the next meeting location, which was confirmed to be New York, and a reminder for participants to reach out if they need to schedule additional discussions between now and the March meeting.