Deep Learning Approaches on Image Captioning: A Review

31 January 2022

Papers citing "Deep Learning Approaches on Image Captioning: A Review"

27 / 27 papers shown

Title
Tri-FusionNet: Enhancing Image Description Generation with Transformer-based Fusion Network and Dual Attention Mechanism Lakshita Agarwal Bindu Verma ViT 27 0 0 23 Apr 2025
GraphT5: Unified Molecular Graph-Language Modeling via Multi-Modal Cross-Token Attention Sangyeup Kim Nayeon Kim Yinhua Piao Sun Kim 39 0 0 07 Mar 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning Israa Al Badarneh Bassam Hammo Omar Al-Kadi 45 3 0 28 Jan 2025
Perception of Visual Content: Differences Between Humans and Foundation Models Nardiena A. Pratama Shaoyang Fan Gianluca Demartini VLM 97 0 0 28 Nov 2024
AutoBench-V: Can Large Vision-Language Models Benchmark Themselves? Han Bao Yue Huang Yanbo Wang Jiayi Ye Xiangqi Wang Xiuying Chen Mohamed Elhoseiny X. Zhang Mohamed Elhoseiny Xiangliang Zhang 47 7 0 28 Oct 2024
Effectively Enhancing Vision Language Large Models by Prompt Augmentation and Caption Utilization Minyi Zhao Jie Wang Z. Li Jiyuan Zhang Zhenbang Sun Shuigeng Zhou MLLM VLM 27 0 0 22 Sep 2024
Evaluating authenticity and quality of image captions via sentiment and semantic analyses Aleksei Krotov Alison Tebo Dylan K. Picart Aaron Dean Algave 21 0 0 14 Sep 2024
Pixels to Prose: Understanding the art of Image Captioning Hrishikesh Singh Aarti Sharma Millie Pant 3DV VLM 25 0 0 28 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis Uri Berger Gabriel Stanovsky Omri Abend Lea Frermann 29 0 0 09 Aug 2024
Vision-Language Models under Cultural and Inclusive Considerations Antonia Karamolegkou Phillip Rust Yong Cao Ruixiang Cui Anders Søgaard Daniel Hershcovich VLM 51 7 0 08 Jul 2024
Brain Dialogue Interface (BDI): A User-Friendly fMRI Model for Interactive Brain Decoding Heng-Chiao Huang Lin Zhao Zihao Wu Xiaowei Yu Jing Zhang Xintao Hu Dajiang Zhu Tianming Liu 24 1 0 17 Jun 2024
Transforming Dental Diagnostics with Artificial Intelligence: Advanced Integration of ChatGPT and Large Language Models for Patient Care Masoumeh Farhadi Nia Mohsen Ahmadi Elyas Irankhah LM&MA AI4CE 32 6 0 07 Jun 2024
Compressed Image Captioning using CNN-based Encoder-Decoder Framework Md Alif Mahmudul Hasan Shovon Bhowmick 48 1 0 28 Apr 2024
RankCLIP: Ranking-Consistent Language-Image Pretraining Yiming Zhang Zhuokai Zhao Zhaorun Chen Zhili Feng Zenghui Ding Yining Sun SSL VLM 48 7 0 15 Apr 2024
A Review of Multi-Modal Large Language and Vision Models Kilian Carolan Laura Fennelly A. Smeaton VLM 22 22 0 28 Mar 2024
Cognitive resilience: Unraveling the proficiency of image-captioning models to interpret masked visual content Zhicheng Du Zhaotian Xie Huazhang Ying Likun Zhang Peiwu Qin 16 0 0 23 Mar 2024
Inserting Faces inside Captions: Image Captioning with Attention Guided Merging Yannis Tevissen Khalil Guetari Marine Tassel Erwan Kerleroux Frédéric Petitpont 35 0 0 20 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes Ting Yu Xiaojun Lin Shuhui Wang Weiguo Sheng Qingming Huang Jun-chen Yu 3DV 48 10 0 12 Mar 2024
Question-Instructed Visual Descriptions for Zero-Shot Video Question Answering David Romero Thamar Solorio 101 4 0 16 Feb 2024
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization Dongchen Han Xiaojun Jia Yang Bai Jindong Gu Yang Liu Xiaochun Cao VLM 30 22 0 07 Dec 2023
Beyond Images: An Integrative Multi-modal Approach to Chest X-Ray Report Generation Nurbanu Aksoy Serge Sharoff Selçuk Başer Nishant Ravikumar Alejandro F Frangi MedIm 19 4 0 18 Nov 2023
A Survey on Image-text Multimodal Models Ruifeng Guo Jingxuan Wei Linzhuang Sun Khai Le-Duc Guiyong Chang Dawei Liu Sibo Zhang Zhengbing Yao Mingjun Xu Liping Bu VLM 31 5 0 23 Sep 2023
Contextualized Keyword Representations for Multi-modal Retinal Image Captioning Jia-Hong Huang Ting-Wei Wu M. Worring MedIm 55 26 0 26 Apr 2021
Comprehensive Image Captioning via Scene Graph Decomposition Yiwu Zhong Liwei Wang Jianshu Chen Dong Yu Yin Li 84 124 0 23 Jul 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA Luowei Zhou Hamid Palangi Lei Zhang Houdong Hu Jason J. Corso Jianfeng Gao MLLM VLM 252 927 0 24 Sep 2019
Visual Translation Embedding Network for Visual Relation Detection Hanwang Zhang Zawlin Kyaw Shih-Fu Chang Tat-Seng Chua ViT 145 560 0 27 Feb 2017
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 152 1,465 0 06 Jun 2016