IbPRIA 2025: 12th Iberian Conference on Pattern Recognition and Image Analysis

Coimbra, Portugal. June 30 - July 3, 2025

IbPRIA 2025 Accepted Papers

Oral Session 7 - Computer Vision

A Spatial Dense CRF Framework for Post-Processing in Multispectral Image Segmentation

Wilgo Nunes, Gil Gonçalves, Cristiano Premebida

Abstract:
Image segmentation plays a key role in many domains, particularly in precision agriculture, where accurate classification of crops and vegetation is essential for optimizing farming practices. Multispectral imaging enhances segmentation performance by providing spectral information beyond the visible range, enabling a better performance in real-world conditions. Deep learning models have shown remarkable success in multispectral image segmentation however, the predictions often lack spatial coherence thus leading to noisy segmentation outputs that can negatively impact agricultural decision-making. To address this challenge, we propose a spatial dense Conditional Random Field (CRF) framework for post-processing deep learning-based segmentation. Instead of relying solely on sigmoid activation functions for thresholding logits, our approach refines segmentation outputs by leveraging spatial dependencies between pixels. This method involves estimating the CRF parameters based on training data and applying the refined model to test logits and consequently enhancing segmentation performance. By integrating spatial information thre framework described here improves the consistency and robustness of segmentation results,ultimately leading to better decision-making in precision agriculture.

Enhancing Multi-Object Tracking with Segmentation Masks: A Solution for Lost Object Recovery

Manuel Bendaña, Lorenzo Vaquero, Victor M. Brea, Manuel Mucientes

Abstract:
Tracking by detection is an effective approach to addressing the multiple object tracking problem. Detections are extracted and matched across the different frames of a video. However, detection errors persist, leading to false negatives that degrade tracker performance. In this work, we propose an architecture to overcome detection failures. Instead of using bounding boxes, which lack precision in crowded situations, we propose obtaining and tracking segmentation masks for each object. Results on the MOT20 crowded dataset demonstrate our ability to improve the performance of state-of-the-art methods.

Enriching Unbounded Appearances for Neural Radiance Fields

Ahmad AlMughrabi, Umair Haroon, Ricardo Marques, Petia Radeva

Abstract:
Neural radiance fields (NeRF) have recently appeared as a powerful tool for generating realistic views of objects and confined areas. Still, they face serious challenges with open scenes, where the camera has unrestricted movement, and content can appear at any distance. In such scenarios, current NeRF-inspired models frequently yield hazy or pixelated outputs, suffer slow training times, and might display irregularities because of the challenging task of reconstructing an extensive scene from a limited number of images. We propose a new framework to boost the performance of NeRF-based architectures yielding significantly superior outcomes compared to the prior work. Our solution overcomes several obstacles that plagued earlier versions of NeRF, including handling multiple video inputs, selecting keyframes, and extracting poses from real-world frames that are ambiguous and symmetrical. Furthermore, we applied our framework, dubbed "Pre-NeRF 360", to enable the use of the Nutrition5k dataset in NeRF and introduce an updated version of this dataset, known as the N5k360 dataset. The source code, the dataset, and pre-trained weights for Pre-NeRF are publicly available at https://amughrabi.github.io/prenerf.

Estimating object physical properties from RGB-D vision and depth robot sensors using deep learning

Ricardo Pedreiras Cardoso, Plinio Moreno

Abstract:
Inertial mass plays a crucial role in robotic applications such as object grasping, manipulation, and simulation, providing a strong prior for planning and control. Accurately estimating an object's mass before interaction can significantly enhance the performance of various robotic tasks. However, mass estimation using only vision sensors is a relatively underexplored area. This thesis proposes a novel approach combining sparse point-cloud data from depth images with RGB images to estimate the mass of objects. We evaluate a range of point-cloud processing architectures alongside RGB-only methods. To overcome the limited availability of training data, we create a synthetic dataset using ShapeNetSem 3D models, simulating RGBD images via a Kinect camera. This synthetic data is used to train an image generation model for estimating dense depth maps, which we then use to augment an existing dataset of images paired with mass values. Our approach significantly outperforms existing benchmarks across all evaluated metrics. The data generation as well as the training of the depth estimator and the mass estimator are available online.

Masking of Gaussian noise in color images: A psychophysical study of just-noticeable differences using synthetic image patches of different luminance value

Luis Miguel Calvo, Pedro Latorre-Carmona, Samuel Morillas, Rafael Huertas, Rafal Mantiuk

Abstract:
Understanding how noise affects visual image quality and modeling its perception are crucial aspects of digital image processing, especially when image quality is of paramount importance. Taking into account that the most common sources of camera noise are modelled in the linear RGB space (lRGB) as white Gaussian noise, we focus on this particular noise type. In our work, we are particularly interested in studying how different noise intensities are perceived. We assume that a good indicator for this perception is the determination of just noticeable differences (JNDs) among noise intensities. We also assume that there is a dependency between the JNDs, the noise intensity and the background luminance. Psychophysical experiments show that there is a dependency on the computed JNDs that is related to reference image noise intensity in the Weber law. The relationship between the JND values and the background luminance is, however, not so evident. These results highlight the fact that JNDs of Gaussian noise in images present interdependencies worth to be further studied through more experiments in the near future.

MixUDA: From Synthetic to Real Object Detection

Pablo Gil-Pérez, Daniel Cores, Manuel Mucientes

Abstract:
Object detection has made remarkable progress in recent years, driven by advancements in deep learning and the availability of large-scale annotated datasets. However, these methods often require extensive labeled data, which may not be accessible for specific or emerging applications. This limitation has generated interest in Unsupervised Domain Adaptation (UDA), which facilitates knowledge transfer from a labeled source domain to an unlabeled and differently distributed target domain. This study addresses the challenge of Unsupervised Domain Adaptation between synthetic and real-world data. A methodology for generating synthetic datasets is proposed using AirSim and Unreal Engine, enabling the creation of highly customizable and diverse datasets. We also propose a Domain Adaptation technique, MixUDA, that maximizes the utility of the synthetic dataset to improve the performance of a model in a real domain. MixUDA is a UDA approach which uses a Mean Teacher architecture and employs pseudo-labels combined with two different image-mixing operations to achieve a smooth and progressive transition from the synthetic to the real domain: pseudo-mosaic and pseudo-mixup. The obtained results demonstrate encouraging progress, as MixUDA surpasses state-of-the-art models D3T and MixPL by 1.18 and 4 AP points respectively, approaching performance of oracle models trained directly on the target domain. These findings suggest that synthetic datasets have significant potential in addressing data scarcity and improving model generalization, while also pointing to promising directions for further exploration in this area.

Publisher

Endorsed by

Technical Sponsors