High-Performance Data Analytics with Python

Name: High-Performance Data Analytics with Python
Start: 2026-06-15T11:00:00+02:00
End: 2026-06-16T15:00:00+02:00
Location: Online

15 Jun 2026, 11:00 → 16 Jun 2026, 15:00 Europe/Warsaw

Online

Klemens Noga (ACC Cyfronet AGH), Leszek Grzanka (ACC Cyfronet AGH)

Description

This training introduces High-Performance Data Analytics (HPDA) through interactive Jupyter notebook exercises. You will learn how to use Python with libraries such as Pandas and Dask for efficient large-scale data analysis. The course explains the fundamentals of High-Performance Computing (HPC) and High-Throughput Computing (HTC), showing how both support scalable data processing and performance optimization.

Level

Target audience

Training intended for researchers, engineers, and data scientists aiming to accelerate their analytics workflows.

Requirements

No recommended training courses before this one.

No account required.

Working knowledge of Python and basic experience with Jupyter notebooks.

Venue

The workshop will be held online via Zoom. The meeting link will be sent to registered participants.

Language

Polish or English, depending on the registered participants.

Duration

2 x 4 hours

Registration

The Registration and the Waiting list close automatically after 12th June 2026. The Registration may close prematurely if the participant limit is reached, but the Waiting List will remain available until the deadline. Please use this link to register https://events.plgrid.pl/e/2026-06-15-python-HPDA

Monday 15 June
- Mon 15 Jun
- Tue 16 Jun
- 1
  
  Introduction to HPC & the Athena Supercomputer
  
  A brief overview of High-Performance Computing (HPC) concepts, logging into the Athena supercomputer at ACK Cyfronet AGH, and the basics of accessing computational resources.
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)
- 2
  
  Environment Setup & Python Memory Model
  
  Initializing the JupyterHub environment, verifying dependencies, and exploring the performance differences between standard Python lists and NumPy arrays
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)
- 12:15
  
  Coffee break
- 3
  
  Introduction to Data Analysis with Pandas
  
  Hands-on introduction to Pandas using real-world weather datasets. Covers basic data manipulation, filtering, and preparation for parallel processing.
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)
- 13:45
  
  Coffee break
- 4
  
  Scaling Up: First Steps with Dask
  
  Transitioning from Pandas to Dask DataFrames. An introduction to distributed data processing and executing our first operations on larger-than-memory datasets.
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)
Tuesday 16 June
- Mon 15 Jun
- Tue 16 Jun
- 5
  
  Dask Performance & Lazy Evaluation
  
  A condensed overview of how Dask works under the hood. Understanding task graphs, lazy evaluation mechanics, and benchmarking Pandas versus Dask performance.
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)
- 11:45
  
  Coffee break
- 6
  
  Processing Scientific Data at Scale
  
  Analyzing real high-frequency physics datasets (Parquet format) from proton therapy beam measurements using single-node Dask clusters.
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)
- 13:15
  
  Coffee break
- 7
  
  Multi-node Computing & Final Challenge
  
  Utilizing dask-jobqueue and SLURMCluster to scale out computations across multiple Athena nodes. Plenty of hands-on time to tackle the open-ended multi-dataset challenge.
  
  Speaker: Leszek Grzanka (ACC Cyfronet AGH)

Choose timezone