A Practical Guide to LLM Deployment and RAG Systems
Wednesday 10 September 2025 -
09:30
Monday 8 September 2025
Tuesday 9 September 2025
Wednesday 10 September 2025
09:30
Hands-on Setup (Optional)
-
Marios Constantinou
(
CaSToRC CyI
)
Christodoulos Stylianou
(
CaSToRC CyI
)
Hands-on Setup (Optional)
Marios Constantinou
(
CaSToRC CyI
)
Christodoulos Stylianou
(
CaSToRC CyI
)
09:30 - 10:00
Room: Andreas Mouskos Seminar Room
Please use this session to ensure you can access the HPC system.
10:00
Deploying Large Language Models Locally
-
Christodoulos Stylianou
(
CaSToRC CyI
)
Deploying Large Language Models Locally
Christodoulos Stylianou
(
CaSToRC CyI
)
10:00 - 10:45
Room: Andreas Mouskos Seminar Room
This presentation covers the process of deploying large language models on local machines and high-performance computing systems. It focuses on the tools and workflows needed to run models efficiently without relying on cloud infrastructure. The talk will include practical tips for setting up environments, managing resources, and avoiding common issues during deployment. It will also introduce retrieval-augmented generation (RAG) systems and explain how they can be used to improve model responses with local or custom data. The goal is to provide a clear, practical overview for anyone interested in working with LLMs in a self-hosted environment.
10:45
A Practical overview of Transformers, Embeddings and RAG Systems
-
Nikolaos Bakas
(
GRNET
)
A Practical overview of Transformers, Embeddings and RAG Systems
Nikolaos Bakas
(
GRNET
)
10:45 - 12:00
Room: Andreas Mouskos Seminar Room
In this session, we will present how to set up and use Large Language Models (LLMs) for various tasks, using the Hugging Face Transformers library. We will cover techniques for inference and text generation, including streaming outputs, and utilize embeddings to understand and visualize semantic relationships between words and sentences using cosine similarity. A key focus of the seminar will be on explaining the basics of Retrieval Augmented Generation (RAG), where we will demonstrate how to build a system that retrieves relevant information from a text corpus to answer user questions. By the end of this session, you will have hands-on experience with powerful LLM tools and an understanding of how to build custom LLM applications that combine language generation with information retrieval.
12:00
Break
Break
12:00 - 12:30
Room: Andreas Mouskos Seminar Room
12:30
Hands-On: Model Deployment through vLLM, Communication and Creation of RAG Pipelines
-
Marios Constantinou
(
CaSToRC CyI
)
Hands-On: Model Deployment through vLLM, Communication and Creation of RAG Pipelines
Marios Constantinou
(
CaSToRC CyI
)
12:30 - 14:30
Room: Andreas Mouskos Seminar Room
In this hands-on session, participants will deploy large language models on Cyclone, the National High Performance Computing (HPC) infrastructure, using tools like vLLM for efficient inference and Haystack for building retrieval-augmented generation (RAG) pipelines. The session will guide attendees through the end-to-end process of setting up model environments, running local inference, and integrating retrieval components to create responsive, data-aware applications. By working directly on HPC resources, participants will gain practical experience in managing compute workloads, handling model-serving pipelines, and building systems that combine LLM outputs with relevant external knowledge.