Research Article

Deep Gated Recurrent Unit for Smartphone-Based Image Captioning

Volume: 4 Number: 2 August 31, 2021
EN

Deep Gated Recurrent Unit for Smartphone-Based Image Captioning

Abstract

Expressing the visual content of an image in natural language form has gained relevance due to technological and algorithmic advances together with improved computational processing capacity. Many smartphone applications for image captioning have been developed recently as built-in cameras provide advantages of easy-operation and portability, resulting in capturing an image whenever or wherever needed. Here, an encoder-decoder framework based new image captioning approach with a multi-layer gated recurrent unit is proposed. The Inception-v3 convolutional neural network is employed in the encoder due to its capability of more feature extraction from small regions. The proposed recurrent neural network-based decoder utilizes these features in the multi-layer gated recurrent unit to produce a natural language expression word-by-word. Experimental evaluations on the MSCOCO dataset demonstrate that our proposed approach has the advantage over existing approaches consistently across different evaluation metrics. With the integration of the proposed approach to our custom-designed Android application, named “VirtualEye+”, it has great potential to implement image captioning in daily routine.

Keywords

References

  1. B. Makav and V. Kılıç, "A New Image Captioning Approach for Visually Impaired People," in 11th International Conference on Electrical and Electronics Engineering, 2019, pp. 945-949: IEEE.
  2. B. Makav and V. Kılıç, "Smartphone-based Image Captioning for Visually and Hearing Impaired," in 11th International Conference on Electrical and Electronics Engineering, 2019, pp. 950-953: IEEE.
  3. G. Kulkarni et al., "Baby talk: Understanding and generating image descriptions," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2011, pp. 1601-1608.
  4. M. Mitchell et al., "Midge: Generating image descriptions from computer vision detections," in Proceedings of the 13th Conference of the European Chapter of the Association for Computational Linguistics, 2012, pp. 747-756: Association for Computational Linguistics.
  5. D. Elliott and F. Keller, "Image description using visual dependency representations," in Proceedings of the Conference on Empirical Methods in Natural Language Processing, 2013, pp. 1292-1302.
  6. X. Zhang, X. Wang, X. Tang, H. Zhou, and C. Li, "Description generation for remote sensing images using attribute attention mechanism," Remote Sensing, vol. 11, no. 6, p. 612, 2019.
  7. H. Fang et al., "From captions to visual concepts and back," in Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2015, pp. 1473-1482.
  8. R. Mason and E. Charniak, "Nonparametric method for data-driven image captioning," in Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), 2014, pp. 592-598.

Details

Primary Language

English

Subjects

Artificial Intelligence

Journal Section

Research Article

Publication Date

August 31, 2021

Submission Date

January 22, 2021

Acceptance Date

May 13, 2021

Published in Issue

Year 2021 Volume: 4 Number: 2

APA
Kılıç, V. (2021). Deep Gated Recurrent Unit for Smartphone-Based Image Captioning. Sakarya University Journal of Computer and Information Sciences, 4(2), 181-191. https://doi.org/10.35377/saucis.04.02.866409
AMA
1.Kılıç V. Deep Gated Recurrent Unit for Smartphone-Based Image Captioning. SAUCIS. 2021;4(2):181-191. doi:10.35377/saucis.04.02.866409
Chicago
Kılıç, Volkan. 2021. “Deep Gated Recurrent Unit for Smartphone-Based Image Captioning”. Sakarya University Journal of Computer and Information Sciences 4 (2): 181-91. https://doi.org/10.35377/saucis.04.02.866409.
EndNote
Kılıç V (August 1, 2021) Deep Gated Recurrent Unit for Smartphone-Based Image Captioning. Sakarya University Journal of Computer and Information Sciences 4 2 181–191.
IEEE
[1]V. Kılıç, “Deep Gated Recurrent Unit for Smartphone-Based Image Captioning”, SAUCIS, vol. 4, no. 2, pp. 181–191, Aug. 2021, doi: 10.35377/saucis.04.02.866409.
ISNAD
Kılıç, Volkan. “Deep Gated Recurrent Unit for Smartphone-Based Image Captioning”. Sakarya University Journal of Computer and Information Sciences 4/2 (August 1, 2021): 181-191. https://doi.org/10.35377/saucis.04.02.866409.
JAMA
1.Kılıç V. Deep Gated Recurrent Unit for Smartphone-Based Image Captioning. SAUCIS. 2021;4:181–191.
MLA
Kılıç, Volkan. “Deep Gated Recurrent Unit for Smartphone-Based Image Captioning”. Sakarya University Journal of Computer and Information Sciences, vol. 4, no. 2, Aug. 2021, pp. 181-9, doi:10.35377/saucis.04.02.866409.
Vancouver
1.Volkan Kılıç. Deep Gated Recurrent Unit for Smartphone-Based Image Captioning. SAUCIS. 2021 Aug. 1;4(2):181-9. doi:10.35377/saucis.04.02.866409

Cited By

 

INDEXING & ABSTRACTING & ARCHIVING

 

31045 31044   ResimLink - Resim Yükle  31047 

31043 28939 28938 34240
 

 

29070    The papers in this journal are licensed under a Creative Commons Attribution-NonCommercial 4.0 International License