IbPRIA 2025: 12th Iberian Conference on Pattern Recognition and Image Analysis

Coimbra, Portugal. June 30 - July 3, 2025

IbPRIA 2025 Accepted Papers

Poster Session 1

A continuous, differentiable, probability-expressed harm risk estimator for robot actions in dynamic human-centric environments

Andrey Solovov, Paulo Menezes

Abstract:
In environments with unpredictable agent traversal (e.g. Industry 5.0 (I5) workfloors, hospitals, malls), a robot benefits from a harm risk-based criterion that judges whether evasive behaviors are actually necessary. Such an approach ensures safety when plan adjustment is needed, while preserving efficiency and agility when not, rendering the I5 paradigm more practicable and reducing congestion. In this paper, we propose a risk estimator based on mean free path that maps robot actions to probabilities of a harmful outcome. The mapping has a simple underlying construction, and the resulting risk space is continuous and differentiable across actions (if the underlying hazard map is such), making our model easily compatible with both optimization algorithms and Reinforcement Learning (RL) policies. It is expandable beyond accounting merely for visible agents, allowing for the easy inclusion of contributions due to latent hazards, such as blind spots, or even hazard map adjustments due to agent Theory of Mind (ToM) (e.g. awareness of robot). We show via Monte Carlo simulation that, for harm probabilities above 0.1, the accuracy of our method does not fall below a 5% error relative to the Ground Truth (GT) value.

An Optimized Multi-class Classification for Industrial Control Systems

Ágata Palma, Mário Antunes, Ana Alves

Abstract:
Ensuring the security of Industrial Control Systems (ICS) is increasingly critical due to rising connectivity and cyber threats. Traditional security measures often fail to detect evolving attacks, necessitating more effective solutions. This paper evaluates machine learning (ML) methods for ICS cybersecurity, using the ICS-Flow dataset and Optuna for hyperparameters tuning. The selected models, namely Random Forest (RF), AdaBoost, XGBoost, Deep Neural Networks, Artificial Neural Networks, ExtraTrees (ET), and Logistic Regression, are assessed using macro-averaged F1-score to handle class imbalance. Experimental results demonstrate that ensemble-based methods (RF, XGBoost, and ET) offer the highest overall detection performance, particularly in identifying commonly occurring attack types. However, minority classes, such as IP-Scan, remain difficult to detect accurately, indicating that hyperparameters tuning alone is insufficient to fully address imbalanced ICS data. These findings highlight the importance of complementary measures, such as focused feature selection to enhance classification capabilities and safeguarding industrial networks against a broader array of threats.

Assessing Dimensionality Reduction on Driving Range Estimation

João Valido, David Albuquerque, Artur Ferreira, David Coutinho

Abstract:
The use of Electric Vehicles (EV) has increased in recent years. The autonomy of the EV, expressed as its Driving Range (DR) is a key factor. This autonomy depends on several variables related to the vehicle itself as well as with external conditions. An accurate estimation of the DR value at each moment is a challenging task. In this paper, we build a dataset with 11 features, for DR estimation, using publicly available EV data. Then, we discuss the use of Machine Learning (ML) regression techniques to estimate DR, with Linear Regression (LR), Multilayer Perceptron (MLP), and Radial Basis Function (RBF) neural networks. Moreover, we assess the effect of unsupervised dimensionality reduction techniques using feature selection and feature reduction approaches. The experimental results show that the use of both feature selection and feature reduction are useful at reducing the dimensionality of the data, keeping or improving the performance for DR estimation. This study also identifies a few top features for DR estimation.

Deformation-Aware Butterfly Tracking in Raw Spatio-Spectral Images

Erick Adjé, Arnaud Ahouandjinou, Gilles Delmaire, Probus Kiki, Gilles Roussel

Abstract:
This paper presents a novel approach to butterfly tracking using Kalman Filters that integrates motion with its orientation and deformation dynamics to improve accuracy. By incorporating the butterfly's deformation model and orientation, we improve the prediction of its position and its oriented bounding box dimensions, a challenge not addressed by traditional tracking methods based on Kalman filters. The performance of different Kalman filters is evaluated on spatio-spectral image sequences, showing that the Extended Kalman Filter, which accounts for non-linear deformation, gives the best results in terms of accuracy. A cost function for associating Kalman filter trajectories with new target detections is also introduced, considering position, orientation, and oriented bounding box dimensions to enhance identity preservation. This approach improves tracking performance and lays the foundation for more accurate butterfly pest recognition using raw spatio-spectral images in agricultural fields.

ECG-Based Biometric Identification: An Exloratory Study Using Fingertip Signals Acquired With Solid-State Electrodes

Teresa M.C. Pereira, Raquel C. Conceição, Vitor Sencadas, Raquel Sebastião

Abstract:
This study presents an exloratory framework for electrocardiogram (ECG)-based biometric identification using ECG signals acquired using biocompatible, solid-state conductive electrodes. The system addresses key challenges in ECG biometrics, including user convenience, temporal stability, and minimal data requirements. ECG signals were collected from 81 participants in four sessions, with intervals of one, two, and three weeks between sessions, to evaluate the system's performance across varying temporal conditions. To validate the feasibility of the proposed electrodes, simultaneous recordings were made using traditional Ag/AgCl electrodes, with results showing strong agreement (Pearson correlation = 0.9923, RMSE = 0.0174). Gramian Angular Field (GAF) representations were used to transform ECG signals into spatial images, which were then fed into a convolutional neural network (CNN) for subject identification. Results demonstrate that a short ECG segment of 10 heartbeats (approximately 10 seconds) can achieve competitive identification accuracy, with same-session performance reaching 98-99\%. The system maintains moderate accuracy over short- to medium-term intervals (e.g., 55-65\% for one to two weeks), with performance declining over extended time intervals (e.g., 47-50\% for three weeks). These findings suggest the potential of ECG biometrics for real-world applications requiring rapid and non-invasive identification, such as security systems, wearable devices, and healthcare monitoring. Future work will focus on optimizing feature extraction and selection, as well as classification methods, expanding datasets, and addressing long-term signal variability to enhance system robustness. This work advances ECG biometrics by demonstrating the feasibility of short, fingertip-acquired signals for personal identification, paving the way for practical and user-friendly biometric systems.

Enhancing Medical Image Analysis: A Pipeline Combining Synthetic Image Generation and Super-Resolution

Pedro Sousa, Diogo Campas, João Andrade, Pedro Pereira, Tiago Gonçalves, Luís F. Teixeira, Tania Pereira, Hélder P. Oliveira

Abstract:
Cancer is a leading cause of mortality worldwide, with breast and lung cancer being the most prevalent globally. Early and accurate diagnosis is crucial for successful treatment, and medical imaging techniques play a pivotal role in achieving this. This paper proposes a novel pipeline that leverages generative artificial intelligence to enhance medical images by combining synthetic image generation and super-resolution techniques. The framework is validated in two medical use cases (breast and lung cancers), demonstrating its potential to improve the quality and quantity of medical imaging data, ultimately contributing to more precise and effective cancer diagnosis and treatment. Overall, although some limitations do exist, this paper achieved satisfactory results for an image size which is conductive to specialist analysis, and further expands upon this field's capabilities.

Fine-Grained Visual Classification of Antelope Species

Philipp Gruner, Maya Beukes, Vanessa Suessle, Matthias Biber, Martin Jansen, Elke Hergenröther, Andreas Weinmann

Abstract:
Fine-grained species classification is essential for biodiversity monitoring, enabling the detection of population changes and ecological shifts. However, distinguishing between morphologically similar species remains a significant challenge. Given the large volume of camera trap data, there is a growing need for AI-based classification models to support automated species identification. This study investigates fine-grained visual classification (FGVC) of seven visually similar antelope species by evaluating the adaptation of state-of-the-art FGVC models—HERBS, PIM, and MPSA—to this novel domain. To facilitate training and evaluation three camera trap datasets are introduced, incorporating data from expert annotations, citizen science contributions, and iNaturalist data. Five experiments systematically analyze the impact of fine-tuning strategies and dataset quality on classification performance. Results demonstrate the robustness and generalization capabilities of these models across both clean and unclean datasets, with HERBS achieving an accuracy of 99.41% on the introduced Baviaanskloof dataset. These findings highlight the potential of AI-assisted classification for wildlife monitoring and biodiversity conservation.

Image Transformation Sequence Retrieval with General Reinforcement Learning

Enrique Mas-Candela, Antonio Ríos-Vila, Jorge Calvo-Zaragoza

Abstract:
We introduce Image Transformation Sequence Retrieval (ITSR), a novel computer vision challenge that sits at the intersection of visual reasoning and program synthesis. Given a source image and a target image, ITSR aims to recover the specific sequence of transformations (from a predefined set) that converts the source into the target. This challenge bridges the gap between human-interpretable visual reasoning tasks and machine-addressable structured problems. We approach this problem through model-based Reinforcement Learning, combining Monte Carlo Tree Search (MCTS) with deep learning—inspired by the AlphaZero methodology. Our experiments on real-world image data demonstrate that this approach significantly outperforms supervised learning methods across various neural architectures. The results report that a model trained with MCTS is able to outperform its supervised counterpart. Our work finally establishes establishes ITSR as a well-defined benchmark that balances human interpretability with machine-addressable structure, providing a comprehensive evaluation framework to facilitate future research in visual reasoning and transformation sequence retrieval.

Mitigating Distribution Bias in Multimodal Datasets via Clustering-Based Curation

Mustapha El Aichouni, Lluis Gomez, Lei Kang

Abstract:
The effectiveness of self-supervised learning, particularly in vision–language models like CLIP, is largely influenced by the quality and balance of the data rather than the model architecture. Internet-crawled datasets often contain noise and exhibit long-tail distributions, highlighting the need for more robust curation strategies. Existing filtering methods that leverage specificity or quality metrics (e.g., HYPE, DFNs) can remove meaningless or low-quality image–text pairs, yet they often overlook the distribution bias present in clustered sample groups. In this work, we introduce a simple yet effective dataset curation framework that employs a k-means-based method to identify and eliminate noisy and redundant samples in CLIP-based datasets by sampling in a balanced way from refined clusters. Our approach is motivated by the initial observation that both image-only and text-only crawled datasets tend to manifest long-tail distributions, potentially hindering downstream performance. Comprehensive experiments on image, text, and multimodal data demonstrate that our method outperforms baseline CLIP score filtering (which retains pairs with cosine similarity scores above a defined threshold) and competes favorably against alternative strategies on the DataComp benchmark when the text modality is balanced.

Optimizing Medical Image Captioning with Conditional Prompt Encoding

Rendson F. Fernandes, Hugo S. Oliveira, Pedro P. Ribeiro, Helder P. Oliveira

Abstract:
Medical image captioning is an essential tool to produce descriptive text reports of medical images. One of the central problems of medical image captioning is their poor domain description generation because large pre-trained language models are primarily trained in non-medical text domains with different semantics of medical text. To overcome this limitation, we explore improvements in contrastive learning for X-ray images complemented with soft prompt engineering for medical image captioning and conditional text decoding for caption generation. The main objective is to develop a model that improves the accuracy and clinical relevance of the automatically generated captions while guaranteeing their complete linguistic accuracy without corrupting the models' performance. Experiments on the MIMIC-CXR and ROCO datasets showed improvements in accuracy and efficiency, ensuring a more cohesive medical context for captions, aiding medical diagnosis and encouraging more accurate reporting.

PosePilot: An Edge-AI Solution for Posture Correction in Physical Exercises

Rushiraj Gadhvi, Priyansh Desai, Siddharth

Abstract:
Automated pose correction remains a significant challenge in AI-driven fitness systems, despite extensive research in activity recognition. This work presents PosePilot, a novel system that integrates pose recognition with real-time personalized corrective feedback, overcoming the limitations of traditional fitness solutions. Using Yoga, a discipline requiring precise spatio-temporal alignment as a case study, we demonstrate PosePilot’s ability to analyze complex physical movements. Designed for deployment on edge devices, PosePilot can be extended to various at-home and outdoor exercises. We employ a Vanilla LSTM, allowing the system to capture temporal dependencies in pose recognition. Additionally, a BiLSTM with Multihead Attention enhances the model’s ability to process motion contexts, selectively focusing on key limb angles for accurate error detection while maintaining computational efficiency. Most importantly, PosePilot provides instant corrective feedback at every stage of a movement, ensuring precise posture adjustments throughout the exercise routine. The proposed approach 1) performs automatic human posture recognition, 2) provides personalized posture correction feedback at each instant which is crucial in Yoga, and 3) offers a lightweight and robust posture correction model feasible for deploying on edge devices in real-world environments.

Quantum Computing in Prenatal Care: Enhancing Fetal Ultrasound Image Classification with Quantum Neural Networks

Asmae Zanbouaa, Nabila Zrira, Saad Slimani, Ibtissam Benmiloud, El Houssine Bouyakhf

Abstract:
The classification of fetal ultrasound images poses a significant challenge in prenatal care, which is crucial for early anomaly detection and health assessment. Traditional deep learning methods, while effective, often struggle with the complexity and high-dimensional nature of ultrasound data. This study introduces a Quantum Neural Network (QNN)-based framework for classifying fetal ultrasound images, aiming to leverage the potential of quantum computing to overcome the limitations of conventional Convolutional Neural Networks (CNNs). Our QNN model utilizes principles of quantum mechanics to process and analyze complex datasets efficiently, resulting in improved accuracy and computational efficiency. Through a comprehensive evaluation involving annotated ultrasound video clips from 4 healthcare centers around Morocco. the QNN model demonstrates superior performance up to 0.99 in accuracy, and a processing speed up to 4s per epoch, which is efficient compared to the classical CNN. This research not only underscores the feasibility of QNNs in medical imaging but also lays the groundwork for future quantum-enhanced diagnostic tools, which have the potential to revolutionize prenatal care practices.

Saliency-guided Emotion Modeling: Predicting Viewer Reactions from Video Stimuli

Akhila Yaragoppa, Siddharth

Abstract:
Understanding the emotional impact of videos is crucial for applications in content creation, advertising, and Human-Computer Interaction (HCI). Traditional affective computing methods rely on self-reported emotions, facial expression analysis, and biosensing data, yet they often overlook the role of visual saliency—the naturally attention-grabbing regions within a video. In this study, we utilize deep learning to introduce a novel saliency-based approach to emotion prediction by extracting two key features: saliency area and number of salient regions. Using the HD2S saliency model and OpenFace facial action unit analysis, we examine the relationship between video saliency and viewer emotions. Our findings reveal three key insights: (1) Videos with multiple salient regions tend to elicit high-valence, low-arousal emotions, (2) Videos with a single dominant salient region are more likely to induce low-valence, high-arousal responses, and (3) Self-reported emotions often misalign with facial expression-based emotion detection, suggesting limitations in subjective reporting. By leveraging saliency-driven insights, this work provides a computationally efficient and interpretable alternative for emotion modeling, with implications for content creation, personalized media experiences, and affective computing research.

Subfield-Based 1-Attempt Parallel Thinning Algorithms on the Hexagonal Grid

Kálmán Palágyi

Abstract:
Parallel reductions can delete a set of black points from a binary image at a time. Parallel 2D thinning algorithms are composed of parallel reductions to produce centerlines of objects. In subfield-based parallel thinning the digital space is partitioned into k<=2 subfields which are alternatively activated, at a given iteration step k successive parallel reductions assigned to these subfields are performed, and some black points in the active subfield are designated for deletion. A thinning algorithm is 1-attempt if whenever a border point is not deleted in the actual thinning phase, it belongs to the resulting centerline. This paper presents two subfield-based parallel thinning algorithms acting on the nonconventional hexagonal grid. It is shown that both proposed algorithms are topology-preserving and 1-attempt.

Towards a New Categorization of Models for Multivariate Time Series Anomaly Detection

Bruna Alves, Armando Pinho, Sónia Gouveia

Abstract:
Methods for Multivariate Time Series Anomaly Detection(MTSAD) have attracted much interest among the research community, with the number of proposed methodologies exploding in the last six years. Existing reviews and surveys have categorized MTSAD methods into conventional, machine learning and deep neural network (DNN)-based approaches, further distinguishing DNN-based methods by architecture and their focus on Temporal (T), Spatial (S) or Spatio-Temporal (ST) modeling. However, there are still aspects that require further exploration, for example how the spatial and temporal dependencies are organized in ST models. This article proposes a novel characterization of the Temporal/Spatial (T/S) dimension in six categories defined based on the analysis of 177 Scopus-indexed documents on MTSAD obtained through a search on this database between 2019-2024. As the first outcome of a large-scale review of methodologies, this study identifies emerging trends and opens new research directions for future research in MTSAD methodologies.

Towards Event-Driven Evaluation of Surveillance Video Understanding using Natural Language

João Pereira, Vasco Lopes, João Neves, David Semedo

Abstract:
Traditional anomaly detection algorithms have become important for assisting security personnel in monitoring surveillance footage. However, these methods lack semantic understanding and robustness to scenarios outside of the training domain. In this paper, we redefine Surveillance Video Understanding (SVU), focusing on identifying and describing key events in surveillance footage that represent anomalous situations, such as violence, crime, and accidents. We leverage the zero-shot and descriptive capabilities of Large Vision-Language Models (LVLMs) and assess their performance in SVU. To evaluate LVLMs in SVU, we propose and validate a LLM judge, that rewards the detection of key events while not penalizing the omission of background details. From the judge's assessments, we translate two quantitative metrics and report performance of four state-of-the-art LVLMs. Our analysis shows that LVLMs can capture contextual cues reasonably well but struggle to recognize the main abnormal events of the video, especially if only subtle visual cues that disclose individuals' intent are available. This work offers a modern perspective on SVU and enables a more interpretable and generalizable evaluation approach.

Publisher

Endorsed by

Technical Sponsors