Specular highlights play a pivotal role in comprehending scenes within developed visual environment. Nevertheless, their presence can adversely affect the efficacy of solutions in various computer vision tasks. Current methodologies typically use Convolutional Neural Network (CNN)-based Unet architectures for specular highlight detection. However, CNNs exhibit limitations in capturing global contextual information, despite excelling in local context analysis. To utilize global context information, it is proposed a novel network architecture leveraging Vision Transformers (ViTs) to jointly detect and remove specular highlights for a given image. Developed model incorporates a multi-scale patch-based self-attention mechanism to effectively capture global context, alongside a CNN-based feed-forward network for local contextual cues. Experimental results with both quantitative and qualitative evaluations demonstrate that the proposed approach achieves state-of-the-art performance.
Specular highlight detection Specular highlight removal Vision transformers Convolutional neural networks
Primary Language | English |
---|---|
Subjects | Computer Software |
Journal Section | Research Article |
Authors | |
Early Pub Date | March 27, 2025 |
Publication Date | March 28, 2025 |
Submission Date | July 17, 2024 |
Acceptance Date | February 22, 2025 |
Published in Issue | Year 2025Volume: 8 Issue: 1 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License