IbPRIA 2025: 12th Iberian Conference on Pattern Recognition and Image Analysis

Coimbra, Portugal. June 30 - July 3, 2025

IbPRIA 2025 Accepted Papers

Poster Session 2

A Semi-automatic Annotation Framework for Neutrophil Ultrastructure from TEM images

Zahoor Ahmad, Mahmood Alzubaidi, Wared Nour-Eldine, Samia M. Ltaief, Jens Schneider, Abeer R. Al-Shammari, Marco Agus

Abstract:
TEM images of immune cells ultrastructure are critical for understanding immune responses, but manual annotation is time consuming and subjective. To address this, we propose a semi-automatic pipeline integrating the YOLOv9 model with CVAT within a human-in-the-loop framework. Our approach leverages YOLOv9 to generate initial bounding box predictions, converts them to ellipses, and enables iterative refinement in CVAT, improving annotation accuracy over multiple cycles. On a test set of 26 high-resolution TEM images containing 6028 neutrophil ultrastructures, the pipeline increased the mean Average Precision at an Intersection over Union threshold of 0.5 from 0.53 to 0.688 after three iterations, detecting ≈ 95% of the objects and reducing annotator workload by approximately ≈ 80%. The pipeline also demonstrated extensibility, detecting ≈ 80% of ultrastructures in eosinophil TEM images, as validated by biomedical experts. To our knowledge, no dataset currently exists with all immune cell ultrastructures annotated, making our pipeline a valuable tool for generating high-quality datasets. This work accelerates TEM annotation, enhances dataset consistency, and supports broader immune cell research, with future extensions planned for other cell types like lymphocytes and monocytes.

AGE-US: automated gestational age estimation based on fetal ultrasound images

César Díaz-Parga, Marta Nuñez-Garcia, Maria J. Carreira, Gabriel Bernardino, Nicolás Vila-Blanco

Abstract:
Being born small carries significant health risks, including increased neonatal mortality and a higher likelihood of future cardiac diseases. Accurate estimation of gestational age is critical for monitoring fetal growth, but traditional methods, such as estimation based on the last menstrual period, are in some situations difficult to obtain. While ultrasound-based approaches offer greater reliability, they rely on manual measurements that introduce variability. This study presents an interpretable deep learning-based method for automated gestational age calculation, leveraging a novel segmentation architecture and distance maps to overcome dataset limitations and the scarcity of segmentation masks. Our approach achieves performance comparable to state-of-the-art models while reducing complexity, making it particularly suitable for resource-constrained settings and with limited annotated data. Furthermore, our results demonstrate that the use of distance maps is particularly suitable for estimating femur endpoints.

Analysis of Behavioral Trends in Road Traffic Accidents: A Comparative Study Among Latin America, Australia and the UK

Diana Zepeda-Martínez, Angélica Guzmán-Ponce, R. María Valdovinos-Rosas, Carlos Robert Fonseca-Ortiz

Abstract:
Road transport is fundamental in encouraging economic growth, creating employments, and providing individuals with access to essential services. However, accidents on roads represent one of the primary causes of death and injury globally, with a significant impact on economic and health sectors, there by deteriorating the quality of life in a society. This paper uses Machine Learning strategies to analyze road accident behavior in Latin American countries, contrasted with Australia and the United Kingdom. Results suggest that the causes and characteristics of accidents in different regions vary according to the type, age and sex of those involved, as well as temporal aspects that suggest moments of greater risk considering the day, month and years.

Assessing Cross-Device Generalization in Remote Sensing Image Super-Resolution

Afonso Martins, Ana Dias, Francisco Silva, André Sá, Machiel Bos, João Neves

Abstract:
The training of super-resolution (SR) methods for remote sensing applications typically relies on low/high-resolution image pairs derived from a single device by down-sampling the original image. However, this approach does not reflect real-world deployment scenarios, where SR models are applied to images captured from different devices, leading to an unrealistic assessment of the performance of state-of-the-art SR remote sensing methods. To contribute to advancing the knowledge in this field, we introduce a cross-device dataset and evaluation protocol for quantifying the impact of using a single device for training in cross-device inference scenarios. Additionally, we propose a training-free post-processing strategy for domain shift compensation. Our findings show that models trained on a single device suffer a moderate performance decay when applied to unseen devices ({\scriptsize$\approx$}7.3\% in PSNR). This decay is more attenuated ({\scriptsize$\approx$}5.4\% in PSNR) when applying our proposed correction strategy. Also, we observed that training with both devices ensures an improvement of 4.1\% in PSNR in cross-device scenarios over the alternative of using a single device and post-correction during inference. These insights highlight the importance of considering domain shift in remote sensing SR applications.

Capturing the Narrative: a Deep Learning Pipeline for Comics Sequences

Goncalo Marouvo, Francisco Pereira

Abstract:
Comics represent the complex way humans can communicate and expose ideas. We investigate how a deep learning pipeline performs in creating a narrative that describes the story occurring at comics sequences. The framework relies on the multimodal BLIP-2 architecture to bridge the gap between the sequence of images and the text description. Our analysis unveils the relevance of prompt engineering, both in the definition of the most effective prompting, as well as in the strategy adopted for the ingestion of visual information. In this case, it seems preferable to present images sequentially, possibly enriched with some context from previous steps, and complete the process with a final summarization step. The results are promising, revealing that in many situations the pipeline is able to generate text narratives that are close to the semantic space of the real descriptions.

ECOmpress: a web tool for boosting energy efficiency through data compression

Dinis Lei, Denis Yamunaque, Armando J. Pinho, Diogo Pratas

Abstract:
The growing energy consumption of the ICT sector, driven by increasing data volumes, presents significant environmental and eco- nomic challenges. With datacenters accounting for a large portion of global electricity use, finding efficient data management solutions is es- sential. In bioinformatics, a major contributor to big data, optimizing data compression algorithms offers a powerful strategy to reduce both energy consumption and operational costs. Aligned with the United Na- tions’ Sustainable Development Goals and European Union objectives, this study introduces a model for estimating the energy consumption of various data compression algorithms within the ICT sector, focusing on genomic data. The model facilitates the simulation of optimal configura- tions and compression techniques to minimize energy consumption. This model has been implemented into ECOmpress, an online portal that al- lows users to compare the energy impact of different algorithms. When applied to genomic datasets, specialized algorithms reduced energy con- sumption by 26% to 38%, highlighting the benefits of compression opti- mization in bioinformatics. ECOmpress enables users to make informed decisions and achieve energy efficiency across a range of scenarios.

Enhancing IoMT security by using Benford’s law and distance functions

Pedro Fernandes, Séamus Ó Ciardhuáin, Mário Antunes

Abstract:
The increasing connectivity of Internet of Medical Things (IoMT) devices has accentuated their susceptibility to cyberattacks. The sensitive data they handle makes them prime targets for information theft and extortion, while outdated and insecure communication protocols further elevate security risks. This paper presents an innovative approach that combines Benford's natural law with distance functions to detect attacks on IoMT devices. The methodology uses Benford's law to analyze digit frequency and classify IoMT devices traffic as benign or malicious, regardless of attack type. It employs distance-based statistical functions like Jensen-Shannon divergence, Kullback-Leibler divergence, Pearson correlation, and the Kolmogorov test to detect anomalies. The results with the Kolmogorov test with alpha equals to 0.01 achieved the best performance, particularly in DoS ICMP, with a precision of 99.24%, a recall of 98.73%, an F1 score of 98.97%, and an accuracy of 97.81%. The Jensen-Shannon divergence attained a balanced performance in detecting SYN-based attacks, demonstrating high detection rates with minimal computational overhead. These findings highlight Benford's law as promising statistical tool for detecting anomalies in network traffic flows, namely IoMT, offering a robust and efficient approach to improve security.

Gender Classification in Play Works Using BERT-based Models

Jaime Yefi-Verdugo, Raúl Peña-Ortiz, Verónica Romero

Abstract:
Equality between men and women has been a long, complex, and silent struggle. As with history in general, societies tend to forget past events easily. Therefore, analysing spoken words and written texts is essential to understanding the context in which works were created and, more broadly, the realities of different historical periods. In this regard, the Spanish Golden Age of Theatre provides a vast, rich, and diverse collection of works that allow for the study of male and female participation, serving as a starting point for analysing gender equality in that era. This paper focuses on a selection of plays written by Lope de Vega, one of the most prolific, influential, writers of the period, with the aim of classifying the characters portrayed in his works. To achieve this pre-trained BERT models were used for this task, yielding accuracy and meaningful results. This approach prides a novel perspective in Lope de Vega’s play, achieving a F1-score = 0.98.

How LLMs See People

Carlos Roxo, João Marcos, Nuno Gonçalves

Abstract:
This study investigates the linguistic patterns in descriptions of human subjects given by large language models with vision capabilities. We examine descriptions generated by two vision-capable LLMs, Qwen2.5-VL-72B-Instruct and Llama3.2-vision:11b, for a subset of the CelebA dataset. Using word frequency and clustering analyses, we identify distinct common topics in descriptions of people, including facial features, hair characteristics, clothing, body structure, posture, and contextual environment. Our findings reveal differences in how these models organize descriptive concepts, with Llama3.2 demonstrating more gender-centric descriptions compared to Qwen2.5's focus on objective physical attributes. These patterns may reveal underlying conceptual frameworks that shape how LLMs represent human subjects. Our analysis contributes to understanding representation in multimodal AI systems and has implications for reducing bias in descriptions of people.

Lightweight SwinUNETR for Hepatic Segmentation

Marcos Fdez-González, Lois Nodar-Corral, Xose R. Fdez Vidal, Enrique Comesaña

Abstract:
Transformer-based architectures have significantly improved 3D medical image segmentation, yet their high computational demands hinder real-time clinical applications. This study addresses this issue by optimising 3D SwinUNETR, a state-of-the-art transformer model, reducing the number of layers while preserving segmentation accuracy. The goal is to develop a lightweight model capable of real-time liver segmentation from computed tomography (CT) scans, addressing the constraints of resource-limited clinical environments. We propose SwinUNETR-36, a reduced-depth variant of SwinUNETR, and compare it against the original SwinUNETR-48 and a more compact SwinUNETR-24. The models are evaluated on the BTCV dataset, using Dice Similarity Coefficient (DSC), Normalised Surface Distance (NSD), Mean Average Surface Distance (MASD), Hausdorff Distance (HD), and Relative Volume Difference (RVD) as performance measurements. Results demonstrate that SwinUNETR-36 achieves a 50 % reduction in computational complexity compared to SwinUNETR-48 while maintaining comparable segmentation accuracy. Additionally, qualitative assessments reveal that SwinUNETR-36 provides a closer volumetric match to the ground truth, avoiding the over-segmentation errors observed in SwinUNETR-48 and the errors present in SwinUNETR-24. These findings confirm that reducing transformer depth effectively balances segmentation accuracy and computational efficiency, making deep learning models more viable for real-time clinical deployment. Future research will explore additional model compression techniques, such as pruning and quantisation, to enhance performance on limited systems.

Mitigating Overfitting in Fully Transformer Architectures for Handwritten Text Recognition

Carlos Penarrubia, Jose J. Valero-Mas, Jorge Calvo-Zaragoza

Abstract:
Fully Transformer-based architectures have gained significant attention in Handwritten Text Recognition (HTR) due to their ability to model long-range dependencies. However, these models are highly susceptible to overfitting, limiting their generalization capabilities, particularly in data-scarce scenarios. In this work, we systematically analyze overfitting in a fully Transformer-based HTR model and explore various mitigation techniques. By evaluating different strategies, we gain insights into how architectural components contribute to overfitting and how specific techniques enhance model robustness. Our findings align with recent trends in HTR and provide a deeper understanding of overfitting in fully Transformer-based models, thereby offering valuable insights for future advancements in the field.

Mutual-Training Pseudo-Labeling Framework for Fire Segmentation

Antonio Antunovic, Davor Damjanovic, Matej Arlovic, Emmanuel Karlo Nyarko, Franko Hrzic, Josip Balen

Abstract:
Fires can pose an enormous risk to human safety, property, and the environment, and they tend to have devastating consequences. With the development of computer vision and deep learning technologies, fire detection and monitoring systems have increasingly began to utilize these technologies to increase accuracy and reliability. A major bottle- neck in the development of deep learning models is the need for quality large-scale datasets. Since manually annotating images on a pixel level is often monotonous and time-consuming, semi-supervised learning (SSL) offers a viable solution by leveraging a small amount of labeled data, together with a large amount of unlabeled data. One common approach when dealing with SSL is pseudo-labeling, where a model trained on an initial set of labeled data is used to generate pseudo-labels for unlabeled data, which are subsequently incorporated into a training set and used to re-train the model. This paper proposes a novel iterative mutual-training pseudo-labeling framework, along with the pseudo-label refinement tech- nique. Experimental results indicate the effectiveness of the proposed method, by achieving comparable results to a fully supervised setting, highlighting the capability of models to generate accurate and reliable pseudo-labels. Furthermore, these findings suggest the potential of the proposed method to minimize the reliance on manual annotations, while remaining substantial performance.

On the Use of Implicit Representations for Deepfake Detection

Miguel Leão, Nuno Gonçalves

Abstract:
The developments in home computers, united with the thousands upon thousands of images/videos of individuals present on the Internet, allowed for the proliferation of deepfaked media affecting the lives of private individuals and the dangerous spread of misinformation. Current state-of-the art detection methods show impressive results. However the development of improved generation methods overcomes them, as there are generalization difficulties. This paper explores the viability of use of implicit representations of facial videos on deepfake detection. Implicit representations offer computer vision tasks a new paradigm of research, possibly offering alternatives to the current methods based on the color space or frequency domain. This work investigates the use of Sinusoidal Representation Networks (SIRENs) to show a significant difference between Fréchet Video Distance (FVD) scores obtained from bonafide videos and their SIREN reconstruction and deepfake videos and their SIREN reconstruction. This result leads to the conclusion that the SIREN representation of a video can be used as input for a deepfake detection method, opening a new avenue of research.

Spiking Alternatives for the Leaky Integrate-and-Fire Neuron: Applications in Cybersecurity and Financial Threats

Dylan Perdigão, Francisco Antunes, Catarina Silva, Bernardete Ribeiro

Abstract:
Spiking neural networks, the third generation of neural networks, offer a biologically plausible alternative to traditional deep learning models. They promise low-power applications for artificial intelligence, enabling a lower footprint for such algorithms that were becoming more energy-consuming with the venue of large language models. The leaky integrate-and-fire (LIF) is the most common neuron model used for spiking neural networks due to its simplicity and low power consumption while remaining biologically plausible. This paper compares the performance of optimized LIF variants from recurrent to synaptic conductance-based models that are then applied across four well-known image classification benchmarks and two real-world cybersecurity and financial fraud datasets. The experiments assess the performance of these neurons within a spiking window size of 10 and 50 timesteps, exploring the trade-off between performance and energetic consumption of each LIF variant. Our results show that the LIF neuron offers the best performance and energy consumption tradeoff among the tested LIF variants. However, in some cases, other variants can outperform the standard LIF neuron model with the cost of a higher footprint.

Uplift modeling for treatment effect estimation in the prostate cancer treatment landscape

Ana Rodrigues, Nuno Rodrigues, José Guilherme de Almeida, Ana Gaivão, Carlos Bilreiro, Inês Santiago, Joana Ip, Sara Belião, Manolis Tsiknakis, Kostas Marias, Daniele Regge, ProCAncer-I Consortium, Nickolas Papanikolaou, Inês Domingues

Abstract:
This study aimed to utilize causal machine learning techniques, namely uplift modeling to develop a patient specific recommendation system based on radiomic variables for treatment selection in prostate cancer patients. In this work, 1048 cases from ProstateNet, that underwent one of three types of treatment (radical prostatectomy, radiation therapy, and active surveillance), were utilized together with uplift modeling techniques to build a treatment recommendation system for prostate cancer patients. In order to cover as many confounding variables as possible, the following were included: patient age, PSA blood levels, PI-RADS, index lesion location, gleason scores, ISUP grade and radiomic variables extracted from the whole gland or index lesion segmentations. Six different uplift algorithms were explored: S-learner, T-learner, X-learner, R-learner, uplift random forest and transformed outcome (TO) learner. The final model was a TO-learner where only whole-gland T2W radiomic features were included, reaching an area under the uplift curve (AUUC) of 0.5481 (out of a theoretical maximum of 0.7488). This model was used to split the patients in the hold-out test set into three groups, corresponding to each course of treatment. Additionally, nomograms were constructed to guide the treatment recommendation. It was found that younger patients, with low PI-RADS, low second gleason score, but higher ISUP and high PSA would benefit from RT, whereas the opposite characteristics would benefit from RP. Uplift modeling can be a valuable tool for identifying subgroups of patients who may benefit from specific treatments in healthcare.

Publisher

Endorsed by

Technical Sponsors