Hybrid Programming in HPC – MPI+X

Europe/Berlin
Room 0.439 / Rühle Saal (HLRS, University of Stuttgart)

Room 0.439 / Rühle Saal

HLRS, University of Stuttgart

Nobelstraße 19 70569 Stuttgart, Germany
Maksym Deliyergiyev (HLRS, University of Stuttgart)
Description

HY-HLRS_2026_IndusStyle1_800x450.png

Learn how to use and program HLRS's system Hunter.

Most HPC systems are clusters of shared memory nodes. To use such systems efficiently both memory consumption and communication time has to be optimized. Therefore, hybrid programming may combine the distributed memory parallelization on the node interconnect (e.g., with MPI) with the shared memory parallelization inside of each node (e.g., with OpenMP or MPI-3.0 shared memory).

This course analyzes the strengths and weaknesses of several parallel programming models on clusters of SMP nodes. Multi-socket-multi-core systems in highly parallel environments are given special consideration. MPI-3.0 has introduced a new shared memory programming interface, which can be combined with inter-node MPI communication. It can be used for direct neighbor accesses similar to OpenMP or for direct halo copies, and enables new hybrid programming models. These models are compared with various hybrid MPI+OpenMP approaches and pure MPI.

Numerous case studies and micro-benchmarks demonstrate the performance-related aspects of hybrid programming. Hands-on sessions are included on all days. Tools for hybrid programming such as thread/process placement support and performance analysis are presented in a "how-to" section.

This course provides scientific training in Computational Science and, in addition, the scientific exchange of the participants among themselves.

This course is a joint training event of SIDE and EuroCC-Austria, the German and Austrian National Competence Centres for High-Performance Computing. It is organized by the HLRS in cooperation with the VSC Research Center, TU Wien and NHR@FAU.

    • 8:45 AM 9:00 AM
      Welcome
    • 9:00 AM 4:30 PM
      Day1: Introduction to Hybrid Programming in HPC – MPI+X
      • 9:10 AM
        Hunter's hardware architecture and its programming models 50m

        Hunter's hardware architecture and its programming models
        Dr. Christian Simmendinger (HPE) and Igor Pasichnyk (AMD), and Johanna Potyka (AMD)

      • 10:00 AM
        Coffee Break 15m
      • 10:15 AM
        Introduction to Hybrid Programming in HPC – MPI+X 30m
      • 10:45 AM
        Programming Models 5m
      • 10:50 AM
        Programming Models - MPI + OpenMP 55m
      • 11:45 AM
        Practical (how to compile and start) 45m
      • 12:30 PM
        Lunch 1h 30m
      • 2:00 PM
        MPI + OpenMP 45m
      • 2:45 PM
        Coffee Break 15m
      • 3:00 PM
        MPI + OpenMP 45m
      • 3:45 PM
        Practical (how to do pinning) 30m
      • 4:15 PM
        Q&A 15m
    • 8:45 AM 4:30 PM
      Day2: Overlapping Communication and Computation
      • 8:45 AM
        MPI + OpenMP 15m
      • 9:00 AM
        Case study: Simple 2D stencil smoother 30m
      • 9:30 AM
        Practical (hybrid through OpenMP parallelization) 1h 15m
      • 10:45 AM
        Coffee Break 15m
      • 11:00 AM
        Overlapping Communication and Computation 30m
      • 11:30 AM
        Practical (taskloops) 45m
      • 12:15 PM
        MPI + OpenMP Conclusions 15m
      • 12:30 PM
        Lunch 1h 30m
      • 2:00 PM
        MPI + Accelerators 1h
      • 3:00 PM
        Coffee Break 15m
      • 3:15 PM
        MPI + Accelerators 1h
      • 4:15 PM
        Q&A 15m
    • 8:45 AM 4:30 PM
      Day3: MPI Memory Models and Synchronization
      • 8:45 AM
        Programming Models (continued) 20m
      • 9:05 AM
        MPI + MPI-3.0 Shared Memory 55m
      • 10:00 AM
        Coffee Break 15m
      • 10:15 AM
        MPI Memory Models and Synchronization 45m
      • 11:00 AM
        Coffee Break 15m
      • 11:15 AM
        Optimized node to node communication 20m
      • 11:35 AM
        Recap - MPI Virtual Topologies 30m
      • 12:05 PM
        Lunch 1h 30m
      • 1:35 PM
        Topology Optimization 40m
      • 2:15 PM
        Conclusions 15m
      • 2:30 PM
        Coffee Break 15m
      • 2:45 PM
        Practical (replicated data) 1h 15m
      • 4:00 PM
        Q&A 30m