Image-to-text generation contributes significantly across various domains such as entertainment, communication, commerce, security, and education by establishing a connection between visual and textual content through the creation of explanations. This process aims to transform image data into meaningful text, enhancing content accessibility, comprehensibility, and processability. Hence, advancements and studies in this field hold paramount importance. This study focuses on how the fusion of the Sequence-to-Sequence (Seq2seq) model and attention mechanism enhances the generation of more meaningful captions from images. Experiments conducted on the Flickr8k dataset highlight the Seq2seq model's capacity to produce captions in alignment with reference sentences. Leveraging the dynamic focus of the attention mechanism, the model effectively captures detailed aspects of images.
Attention mechanism Seq2seg model Image Capturing Deep learning Image-to-text
Birincil Dil | İngilizce |
---|---|
Konular | Yazılım Mühendisliği (Diğer) |
Bölüm | Makaleler |
Yazarlar | |
Erken Görünüm Tarihi | 27 Nisan 2024 |
Yayımlanma Tarihi | 30 Nisan 2024 |
Gönderilme Tarihi | 9 Ağustos 2023 |
Kabul Tarihi | 26 Mart 2024 |
Yayımlandığı Sayı | Yıl 2024Cilt: 7 Sayı: 1 |
The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License