LLM Steerable Computer Vision Pipeline

10 Nov 2025, 14:10
20m
Online

Online

Speaker

Prof. Janez Perš (University of Ljubljana)

Description

Foundation models plus ample compute make many “moderate” vision tasks solvable with minimal custom code. This talk introduces an LLM-steerable pipeline that compiles a brief YAML spec into end-to-end segmentation, zero-shot classification, and optional geometry checks, executed on GPU clusters.

A remote multimodal LLM (e.g., ChatGPT) generates the configuration based on sample images and human description of the task; a Python runner on HPC invokes SAM2 for mask proposals, CLIP for prompt-driven labels, and optional BLIP-3 VQA for per-crop verification. Crucially, this workflow may double as a data engine: it produces large, reasonably clean pseudo-labeled sets with little manual effort, enabling distillation into compact models that run without HPC.

Presentation materials

There are no materials yet.