v1v2 (latest)

A Comprehensive Survey of Deep Learning for Image Captioning

6 October 2018

Papers citing "A Comprehensive Survey of Deep Learning for Image Captioning"

50 / 231 papers shown

Title
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning Jingqiang Chen 59 4 0 04 Feb 2023
A data science and machine learning approach to continuous analysis of Shakespeare's plays Charles F. Swisher L. Shamir 58 3 0 15 Jan 2023
An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation Kevin Moran Ali Yachnes George Purnell Juanyed Mahmud Michele Tufano Carlos Bernal-Cárdenas Denys Poshyvanyk Zach H’Doubler 85 11 0 03 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges R. Zakari Jim Wilson Owusu Hailin Wang Ke Qin Zaharaddeen Karami Lawal Yue-hong Dong LRM 73 16 0 26 Dec 2022
Do DALL-E and Flamingo Understand Each Other? Hang Li Jindong Gu Rajat Koner Sahand Sharifzadeh Volker Tresp MLLM 77 12 0 23 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training Xinhao Mei Xubo Liu Jianyuan Sun Mark D. Plumbley Wenwu Wang DiffM 79 2 0 05 Dec 2022
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding Runyu Ding Jihan Yang Chuhui Xue Wenqing Zhang Song Bai Xiaojuan Qi VLM 80 154 0 29 Nov 2022
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges K. T. Baghaei Amirreza Payandeh Pooya Fayyazsanavi Shahram Rahimi Zhiqian Chen Somayeh Bakhtiari Ramezani FaML AI4TS 69 6 0 27 Nov 2022
Aesthetically Relevant Image Captioning Zhipeng Zhong Fei Zhou Guoping Qiu 62 9 0 25 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired Kazuya Ohata Shunsuke Kitada Hitoshi Iyatomi 63 0 0 17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge Linli Yao Wei Chen Qin Jin VLM 121 11 0 17 Nov 2022
Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation Runbang Zhang Yixiao Zhang Kai Shao Ying Shan Gus Xia 61 4 0 10 Nov 2022
CLSE: Corpus of Linguistically Significant Entities A. Chuklin Justin Zhao Mihir Kale 62 1 0 04 Nov 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey Hui Wei Hao Tang Xuemei Jia Zhixiang Wang Han-Bing Yu Zhubo Li Shiníchi Satoh Luc Van Gool Zheng Wang AAML 138 56 0 30 Sep 2022
M^4I: Multi-modal Models Membership Inference Pingyi Hu Zihan Wang Ruoxi Sun Hu Wang Minhui Xue 97 27 0 15 Sep 2022
Cross Modal Compression: Towards Human-comprehensible Semantic Compression Jiguo Li Chuanmin Jia Xinfeng Zhang Siwei Ma Wen Gao 35 21 0 06 Sep 2022
Facial Expression Recognition and Image Description Generation in Vietnamese Khang Nhut Lam Kim Thi-Thanh Nguyen Loc Huu Nguy Jugal Kalita 3DH CVBM 57 1 0 12 Aug 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception Keenan I. Jones Enes ALTUNCU V. N. Franqueira Yi-Chia Wang Shujun Li DeLMO 75 3 0 11 Aug 2022
End-to-end deep learning for directly estimating grape yield from ground-based imagery A. Olenskyj B. Sams Zhenghao Fei Vishal Singh P. Raja G. Bornhorst J. M. Earles 59 28 0 04 Aug 2022
Visual Recognition by Request Chufeng Tang Lingxi Xie Xiaopeng Zhang Xiaolin Hu Qi Tian VLM 93 15 0 28 Jul 2022
Controllable Data Generation by Deep Learning: A Review Shiyu Wang Yuanqi Du Xiaojie Guo Bo Pan Zhaohui Qin Liang Zhao 97 28 0 19 Jul 2022
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks Motonari Kambara K. Sugiura 53 6 0 19 Jul 2022
Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information Jiebao Zhang Wenhua Qian Ren-qi Nie Jinde Cao Dan Xu GAN AAML 61 0 0 12 Jul 2022
Vision-and-Language Pretraining Thong Nguyen Cong-Duy Nguyen Xiaobao Wu See-Kiong Ng Anh Tuan Luu VLM CLIP 55 2 0 05 Jul 2022
Gender Artifacts in Visual Datasets Nicole Meister Dora Zhao Angelina Wang V. V. Ramaswamy Ruth C. Fong Olga Russakovsky 70 29 0 18 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding G. Alabduljabbar Hafida Benhidour Said Kerrache 3DV 24 3 0 16 Jun 2022
Video-based Human-Object Interaction Detection from Tubelet Tokens Danyang Tu Wei Sun Xiongkuo Min Guangtao Zhai Wei Shen ViT 95 17 0 04 Jun 2022
A Generative Adversarial Network-based Selective Ensemble Characteristic-to-Expression Synthesis (SE-CTES) Approach and Its Applications in Healthcare Yuxuan Li Ying-Jia Lin Chenang Liu 50 0 0 29 May 2022
Prompt-based Learning for Unpaired Image Captioning Peipei Zhu Tianlin Li Lin Zhu Zhenglong Sun Weishi Zheng Yaowei Wang Chen Chen VLM 97 33 0 26 May 2022
Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search Tianlin Li Zhe Chen Bo Jiang Jin Tang Bin Luo Dacheng Tao 97 19 0 19 May 2022
Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks Samer Alashhab Antonio Javier Gallego Miguel Ángel Lozano 40 18 0 14 May 2022
Translation between Molecules and Natural Language Carl Edwards T. Lai Kevin Ros Garrett Honke Kyunghyun Cho Heng Ji 136 171 0 25 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey Mohammed Hassanin Saeed Anwar Ibrahim Radwan Fahad Shahbaz Khan Ajmal Mian 134 166 0 16 Apr 2022
Guiding Attention using Partial-Order Relationships for Image Captioning Murad Popattia Muhammad Rafi Rizwan Qureshi Shah Nawaz 52 5 0 15 Apr 2022
Image Captioning In the Transformer Age Yangliu Xu Li Li Haiyang Xu Songfang Huang Fei Huang Jianfei Cai ViT 59 6 0 15 Apr 2022
Vision Transformers in Medical Computer Vision -- A Contemplative Retrospection Arshi Parvaiz Muhammad Anwaar Khalid Rukhsana Zafar Huma Ameer M. Ali M. Fraz MedIm 73 63 0 29 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation Yang Yang Xibai Lou Changhyun Choi 82 30 0 15 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition Peipei Zhu Tianlin Li Yong Luo Zhenglong Sun Wei-Shi Zheng Yaowei Wang Chen Chen 102 12 0 07 Mar 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning Mikolaj Malkiñski Jacek Mańdziuk 96 41 0 21 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning J. Tan Y. Tan C. Chan Joon Huang Chuah VLM ViT 77 19 0 11 Feb 2022
Deep Learning Approaches on Image Captioning: A Review Taraneh Ghandi H. Pourreza H. Mahyar VLM 130 101 0 31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning Ziyang Luo Yadong Xi Rongsheng Zhang Jing Ma VLM MLLM 70 16 0 30 Jan 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings Swapnil Bhosale Rupayan Chakraborty Sunil Kumar Kopparapu 64 0 0 28 Jan 2022
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning Peyman Bateni Jarred Barber Raghav Goyal Vaden Masrani Jan-Willem van de Meent Leonid Sigal Frank Wood BDL VLM 93 9 0 13 Jan 2022
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry Karl Lowenmark C. Taal S. Schnabel Marcus Liwicki Fredrik Sandin 45 7 0 11 Dec 2021
Multimodal Fake News Detection Santiago Alonso-Bartolome Isabel Segura-Bedmar 68 67 0 09 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods Zanyar Zohourianshahzadi Jugal Kalita VLM 86 47 0 29 Nov 2021
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention S. Tan Runpei Dong Kaisheng Ma 74 2 0 03 Nov 2021
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances Shibo Zhang Yaxuan Li Shen Zhang Farzad Shahabi S. Xia Yuanbei Deng N. Alshurafa BDL 81 317 0 31 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models J. Tan C. Chan Joon Huang Chuah VLM 124 16 0 07 Oct 2021