Specular highlights play a pivotal role in comprehending scenes within developed visual environment. Nevertheless, their presence can adversely affect the efficacy of solutions in various computer vision tasks. Current methodologies typically use Convolutional Neural Network (CNN)-based Unet architectures for specular highlight detection. However, CNNs exhibit limitations in capturing global contextual information, despite excelling in local context analysis. To utilize global context information, it is proposed a novel network architecture leveraging Vision Transformers (ViTs) to jointly detect and remove specular highlights for a given image. Developed model incorporates a multi-scale patch-based self-attention mechanism to effectively capture global context, alongside a CNN-based feed-forward network for local contextual cues. Experimental results with both quantitative and qualitative evaluations demonstrate that the proposed approach achieves state-of-the-art performance.
Specular highlight detection Specular highlight removal Vision transformers Convolutional neural networks
Birincil Dil | İngilizce |
---|---|
Konular | Bilgisayar Yazılımı |
Bölüm | Research Article |
Yazarlar | |
Erken Görünüm Tarihi | 27 Mart 2025 |
Yayımlanma Tarihi | 28 Mart 2025 |
Gönderilme Tarihi | 17 Temmuz 2024 |
Kabul Tarihi | 22 Şubat 2025 |
Yayımlandığı Sayı | Yıl 2025 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License