Speaker
Description
Archaeological site detection is entering a new era thanks to advances in remote sensing and artificial intelligence. Archaeological sites such as hillforts often have irregular and complex shapes, making them challenging to identify using conventional computer vision methods. Multimodal approaches that combine LiDAR-derived LRM images with aerial orthoimagery improve detection accuracy, but false positives remain a major problem.
In this talk, we explore how extending the principles of large language models (LLMs) to vision can address these challenges. By using cross-modal attention mechanisms, these models integrate multiple data sources, enabling precise boundary detection, reduced false positives, and scalable application across diverse landscapes and site types. A key element of this workflow is a human-in-the-loop refinement process, where archaeologists review and provide feedback on model predictions. This iterative collaboration enriches the training data, improves the system’s ability to distinguish true sites from background anomalies, and enhances overall detection reliability.
Results from Northwest Iberia show a 99.3% reduction in false positives after a single refinement cycle, and nationwide deployment in England demonstrates robust performance across varied site morphologies. By combining multimodal fusion, transformer-based architectures, and expert-guided refinement, this approach delivers both accuracy and interpretability. The talk will also discuss future directions, including predictive modelling to focus searches on high-potential areas, making large-scale archaeological surveys faster and more efficient.