High-Performance Data Analytics with Python

Europe/Warsaw
Online

Online

Klemens Noga (ACC Cyfronet AGH), Leszek Grzanka (ACC Cyfronet AGH)
Description

 

This training introduces High-Performance Data Analytics (HPDA) through interactive Jupyter notebook exercises. You will learn how to use Python with libraries such as Pandas and Dask for efficient large-scale data analysis. The course explains the fundamentals of High-Performance Computing (HPC) and High-Throughput Computing (HTC), showing how both support scalable data processing and performance optimization.

Level

Target audience

Training intended for researchers, engineers, and data scientists aiming to accelerate their analytics workflows.

Requirements

No recommended training courses before this one.

No account required.

Working knowledge of Python and basic experience with Jupyter notebooks.

Venue

The workshop will be held online via Zoom. The meeting link will be sent to registered participants.

Language

Polish or English, depending on the registered participants.

Duration

2 x 4 hours

Registration

The Registration and the Waiting list close automatically after 12th June 2026. The Registration may close prematurely if the participant limit is reached, but the Waiting List will remain available until the deadline. Please use this link to register https://events.plgrid.pl/e/2026-06-15-python-HPDA

    • 1
      Introduction to HPC & the Athena Supercomputer

      A brief overview of High-Performance Computing (HPC) concepts, logging into the Athena supercomputer at ACK Cyfronet AGH, and the basics of accessing computational resources.

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)
    • 2
      Environment Setup & Python Memory Model

      Initializing the JupyterHub environment, verifying dependencies, and exploring the performance differences between standard Python lists and NumPy arrays

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)
    • 12:15
      Coffee break
    • 3
      Introduction to Data Analysis with Pandas

      Hands-on introduction to Pandas using real-world weather datasets. Covers basic data manipulation, filtering, and preparation for parallel processing.

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)
    • 13:45
      Coffee break
    • 4
      Scaling Up: First Steps with Dask

      Transitioning from Pandas to Dask DataFrames. An introduction to distributed data processing and executing our first operations on larger-than-memory datasets.

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)
    • 5
      Dask Performance & Lazy Evaluation

      A condensed overview of how Dask works under the hood. Understanding task graphs, lazy evaluation mechanics, and benchmarking Pandas versus Dask performance.

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)
    • 11:45
      Coffee break
    • 6
      Processing Scientific Data at Scale

      Analyzing real high-frequency physics datasets (Parquet format) from proton therapy beam measurements using single-node Dask clusters.

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)
    • 13:15
      Coffee break
    • 7
      Multi-node Computing & Final Challenge

      Utilizing dask-jobqueue and SLURMCluster to scale out computations across multiple Athena nodes. Plenty of hands-on time to tackle the open-ended multi-dataset challenge.

      Speaker: Leszek Grzanka (ACC Cyfronet AGH)