Introduction to GPU computing with PyTorch

Europe/Warsaw
online

online

Klemens Noga (ACC Cyfronet AGH), Konrad Klimaszewski (Narodowe Centrum Badań Jądrowych), Michał Obara (Narodowe Centrum Badań Jądrowych)
Description

The training provides a practical introduction to general-purpose computing on graphics processing units (GPGPU) with a strong focus on the PyTorch framework. The aim is to provide the skills needed to design, implement, and profile GPU-accelerated computations. By the end of the training, attendees will be able to:

  • use PyTorch tensors on GPU to implement basic numerical algorithms,
  • use PyTorch for linear algebra,
  • manage CPU–GPU memory transfers and reason about performance,
  • profile GPU code and spot the main bottlenecks,
  • write simple custom kernels in Triton and plug them into PyTorch workflows.

 

Level

Target audience

Training is intended for users who would like to accelerate their Python numerical computations using graphics processing units (GPGPUs).

Agenda

  1. Introduction to GPU Multiprocessing
    • GPGPU computing paradigm and typical application domains,
    • Overview of CUDA and hardware-agnostic approaches.
  2. Introduction to PyTorch
    • Tensors: creation, initialisation and parameters,
    • Aggregation and shape operations,
    • Indexing, slicing and broadcasting, boolean and masked tensors,
    • Matrix multiplication and elementwise math,
    • Linear Algebra Using PyTorch.
  3. GPU acceleration using PyTorch
    • Memory management in PyTorch,
    • Comparing GPU vs CPU performance on linear algebra workloads,
    • Motivation and basic principles of performance profiling,
    • Profilers: setup, tracing and visualisation.
  4. Custom GPU Kernels with Triton
    • Motivation for writing custom kernels (performance, flexibility),
    • Overview of Triton and its programming model,
    • Implementing basic kernels (e.g. vector operations, simple reductions),
    • Integration with PyTorch and comparison to built-in operations.

Requirements

  • Basic programming proficiency in Python (control flow, functions, modules).
  • Familiarity with undergraduate-level mathematics: calculus and linear algebra (vectors, matrices, eigenvalues).

Venue

The workshop will be held online via Zoom. The meeting link will be sent to registered participants.

Language

English

Duration

6 hours

Registration

The Registration and the Waiting list close automatically after 23rd March 2026. The Registration may close prematurely if the limit of participants is reached beforehand, but the Waiting List will remain available until the deadline above.

    • 10:00 AM 10:15 AM
      Introduction to GPU Multiprocessing 15m
      • GPGPU computing paradigm and typical application domains
      • Overview of CUDA and hardware-agnostic approaches
    • 10:15 AM 11:45 AM
      Introduction to PyTorch 1h 30m
      • Tensors: creation, initialisation and parameters
      • Aggregation and shape operations
      • Indexing, slicing and broadcasting, boolean and masked tensors
      • Matrix multiplication and elementwise math
      • Linear Algebra Using PyTorch
    • 11:45 AM 12:15 PM
      Break 30m

      30 min break

    • 12:15 PM 1:45 PM
      GPU acceleration using PyTorch 1h 30m
      • Memory management in PyTorch
      • Comparing GPU vs CPU performance on linear algebra workloads
      • Motivation and basic principles of performance profiling
      • Profilers: setup, tracing and visualisation
    • 1:45 PM 2:15 PM
      Break 30m

      30 min break

    • 2:15 PM 3:45 PM
      Custom GPU Kernels with Triton 1h 30m
      • Motivation for writing custom kernels (performance, flexibility)
      • Overview of Triton and its programming model
      • Implementing basic kernels (e.g. vector operations, simple reductions)
      • Integration with PyTorch and comparison to built-in oper