ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04020
  4. Cited By
A Comprehensive Survey of Deep Learning for Image Captioning
v1v2 (latest)

A Comprehensive Survey of Deep Learning for Image Captioning

6 October 2018
Md Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
    VLM3DV
ArXiv (abs)PDFHTML

Papers citing "A Comprehensive Survey of Deep Learning for Image Captioning"

50 / 231 papers shown
Title
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Attention-based transformer models for image captioning across languages: An in-depth survey and evaluation
Israa A. Albadarneh
Bassam Hammo
Omar Al-Kadi
VLM
27
0
0
03 Jun 2025
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
Enhancing Visual Reliance in Text Generation: A Bayesian Perspective on Mitigating Hallucination in Large Vision-Language Models
Nanxing Hu
Xiaoyue Duan
Jinchao Zhang
Guoliang Kang
MLLM
61
0
0
26 May 2025
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Panoptic Captioning: Seeking An Equivalency Bridge for Image and Text
Kun-Yu Lin
Hongjun Wang
Weining Ren
Kai Han
291
0
0
22 May 2025
VoQA: Visual-only Question Answering
VoQA: Visual-only Question Answering
Luyang Jiang
Jianing An
Jie Luo
Wenjun Wu
Lei Huang
LRM
101
0
0
20 May 2025
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Vision Mamba in Remote Sensing: A Comprehensive Survey of Techniques, Applications and Outlook
Muyi Bao
Shuchang Lyu
Zhaoyang Xu
Huiyu Zhou
Jinchang Ren
Shiming Xiang
Xuelong Li
Guangliang Cheng
Mamba
263
0
0
01 May 2025
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
DART: Disease-aware Image-Text Alignment and Self-correcting Re-alignment for Trustworthy Radiology Report Generation
Sang-Jun Park
Keun-Soo Heo
Dong-Hee Shin
Young-Han Son
Ji-Hye Oh
Tae-Eui Kam
MedIm
53
0
0
16 Apr 2025
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Building Trustworthy Multimodal AI: A Review of Fairness, Transparency, and Ethics in Vision-Language Tasks
Mohammad Saleha
Azadeh Tabatabaeib
148
0
0
14 Apr 2025
MicroNN: An On-device Disk-resident Updatable Vector Database
MicroNN: An On-device Disk-resident Updatable Vector Database
Jeffrey Pound
Floris Chabert
Arjun Bhushan
Ankur Goswami
Anil Pacaci
S. R. Chowdhury
48
1
0
08 Apr 2025
ImageSet2Text: Describing Sets of Images through Text
ImageSet2Text: Describing Sets of Images through Text
Piera Riccio
F. Galati
Kajetan Schweighofer
Noa Garcia
Nuria Oliver
VLMCoGe
117
0
0
25 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRLLRMAI4CE
118
1
0
22 Mar 2025
Integrating Frequency-Domain Representations with Low-Rank Adaptation in Vision-Language Models
Md Azim Khan
A. Gangopadhyay
Jianwu Wang
Robert F. Erbacher
VLM
76
0
0
08 Mar 2025
DoraCycle: Domain-Oriented Adaptation of Unified Generative Model in Multimodal Cycles
Rui Zhao
Weijia Mao
Mike Zheng Shou
107
1
0
05 Mar 2025
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
AC-Lite : A Lightweight Image Captioning Model for Low-Resource Assamese Language
Pankaj Choudhury
Yogesh Aggarwal
Prabhanjan Jadhav
Prithwijit Guha
Sukumar Nandi
199
0
0
03 Mar 2025
Octopus: Alleviating Hallucination via Dynamic Contrastive Decoding
Wei Suo
Lijun Zhang
Mengyang Sun
Lin Yuanbo Wu
Peng Wang
Yize Zhang
MLLMVLM
106
3
0
01 Mar 2025
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Beyond RNNs: Benchmarking Attention-Based Image Captioning Models
Hemanth Teja Yanambakkam
Rahul Chinthala
51
0
0
26 Feb 2025
A Comprehensive Survey on Image Signal Processing Approaches for Low-Illumination Image Enhancement
Muhammad Turab
113
0
0
09 Feb 2025
An Ensemble Model with Attention Based Mechanism for Image Captioning
Israa Al Badarneh
Bassam Hammo
Omar Al-Kadi
198
6
0
28 Jan 2025
Mathematical Language Models: A Survey
Mathematical Language Models: A Survey
Wen Liu
Hanglei Hu
Jie Zhou
Yuyang Ding
Junsong Li
...
Mengliang He
Qin Chen
Bo Jiang
Aimin Zhou
Liang He
LRM
229
14
0
03 Jan 2025
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating
  Multimodal Large Language Models Understanding of Complex Image
MOSABench: Multi-Object Sentiment Analysis Benchmark for Evaluating Multimodal Large Language Models Understanding of Complex Image
Shangwen Wang
Chengxiang He
Huijun Liu
Shan Zhao
Chengyu Wang
...
Xiaopeng Li
Qian Wan
Jun Ma
Jie Yu
Xiaoguang Mao
VLM
153
2
0
25 Nov 2024
Exploring Large Language Models for Multimodal Sentiment Analysis:
  Challenges, Benchmarks, and Future Directions
Exploring Large Language Models for Multimodal Sentiment Analysis: Challenges, Benchmarks, and Future Directions
Shezheng Song
84
0
0
23 Nov 2024
Incremental IVF Index Maintenance for Streaming Vector Search
Incremental IVF Index Maintenance for Streaming Vector Search
J. Mohoney
Anil Pacaci
S. R. Chowdhury
U. F. Minhas
Jeffery Pound
Cédric Renggli
Nima Reyhani
Ihab F. Ilyas
Theodoros Rekatsinas
Shivaram Venkataraman
113
2
0
01 Nov 2024
Self-Comparison for Dataset-Level Membership Inference in Large
  (Vision-)Language Models
Self-Comparison for Dataset-Level Membership Inference in Large (Vision-)Language Models
J. Ren
Kangrui Chen
Chen Chen
Vikash Sehwag
Yue Xing
Jiliang Tang
Lingjuan Lyu
66
2
0
16 Oct 2024
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired
  People
NaVIP: An Image-Centric Indoor Navigation Solution for Visually Impaired People
Jun Yu
Yifan Zhang
Badrinadh Aila
V. Namboodiri
106
1
0
08 Oct 2024
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical
  Alignment
EMMA: Empowering Multi-modal Mamba with Structural and Hierarchical Alignment
Yifei Xing
Xiangyuan Lan
Ruiping Wang
D. Jiang
Wenjun Huang
Qingfang Zheng
Yaowei Wang
Mamba
118
0
0
08 Oct 2024
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image
  Captioning
DENEB: A Hallucination-Robust Automatic Evaluation Metric for Image Captioning
Kazuki Matsuda
Yuiga Wada
Komei Sugiura
61
1
0
28 Sep 2024
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual
  Captioning of Human Movement Trajectories
Text2Traj2Text: Learning-by-Synthesis Framework for Contextual Captioning of Human Movement Trajectories
Hikaru Asano
Ryo Yonetani
Taiki Sekii
Hiroki Ouchi
95
0
0
19 Sep 2024
TropNNC: Structured Neural Network Compression Using Tropical Geometry
TropNNC: Structured Neural Network Compression Using Tropical Geometry
Konstantinos Fotopoulos
Petros Maragos
Panagiotis Misiakos
52
0
0
05 Sep 2024
Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer
  Metastasis Using Real-World Clinical Data with AUC up to 0.9
Coalitions of AI-based Methods Predict 15-Year Risks of Breast Cancer Metastasis Using Real-World Clinical Data with AUC up to 0.9
Xia Jiang
Yijun Zhou
Alan Wells
A. Brufsky
OODAI4CE
63
0
0
29 Aug 2024
Pixels to Prose: Understanding the art of Image Captioning
Pixels to Prose: Understanding the art of Image Captioning
Hrishikesh Singh
Aarti Sharma
Millie Pant
3DVVLM
86
1
0
28 Aug 2024
Bi-directional Contextual Attention for 3D Dense Captioning
Bi-directional Contextual Attention for 3D Dense Captioning
Minjung Kim
Hyung Suk Lim
Soonyoung Lee
Bumsoo Kim
Gunhee Kim
72
3
0
13 Aug 2024
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Surveying the Landscape of Image Captioning Evaluation: A Comprehensive Taxonomy, Trends and Metrics Analysis
Uri Berger
Gabriel Stanovsky
Omri Abend
Lea Frermann
71
0
0
09 Aug 2024
Dual-path Collaborative Generation Network for Emotional Video
  Captioning
Dual-path Collaborative Generation Network for Emotional Video Captioning
Cheng Ye
Weidong Chen
Jingyu Li
Li Zhang
Zhendong Mao
126
1
0
06 Aug 2024
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
The Phantom Menace: Unmasking Privacy Leakages in Vision-Language Models
Simone Caldarella
Massimiliano Mancini
Elisa Ricci
Rahaf Aljundi
PILM
72
2
0
02 Aug 2024
Continual Panoptic Perception: Towards Multi-modal Incremental
  Interpretation of Remote Sensing Images
Continual Panoptic Perception: Towards Multi-modal Incremental Interpretation of Remote Sensing Images
Bo Yuan
Danpei Zhao
Zhuoran Liu
Wentao Li
Tian Li
CLLVLM
87
2
0
19 Jul 2024
BadRobot: Jailbreaking Embodied LLMs in the Physical World
BadRobot: Jailbreaking Embodied LLMs in the Physical World
Hangtao Zhang
Chenyu Zhu
Xianlong Wang
Ziqi Zhou
Yichen Wang
...
Shengshan Hu
Leo Yu Zhang
Aishan Liu
Peijin Guo
Leo Yu Zhang
LM&Ro
91
11
0
16 Jul 2024
Graph Transformers: A Survey
Graph Transformers: A Survey
Ahsan Shehzad
Xiwei Xu
Shagufta Abid
Ciyuan Peng
Shuo Yu
Dongyu Zhang
Karin Verspoor
AI4CE
132
14
0
13 Jul 2024
Unexplainability of Artificial Intelligence Judgments in Kant's Perspective
Unexplainability of Artificial Intelligence Judgments in Kant's Perspective
Jongwoo Seo
117
0
0
12 Jul 2024
Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate
  Video-based Bug Reports
Semantic GUI Scene Learning and Video Alignment for Detecting Duplicate Video-based Bug Reports
Yanfu Yan
Nathan Cooper
Oscar Chaparro
Kevin Moran
Denys Poshyvanyk
87
8
0
11 Jul 2024
Enhancing Multimodal Large Language Models with Multi-instance Visual
  Prompt Generator for Visual Representation Enrichment
Enhancing Multimodal Large Language Models with Multi-instance Visual Prompt Generator for Visual Representation Enrichment
Wenliang Zhong
Wenyi Wu
Qi Li
Rob Barton
Boxin Du
Shioulin Sam
Karim Bouyarmane
Ismail B. Tutar
Junzhou Huang
90
3
0
05 Jun 2024
Dreamguider: Improved Training free Diffusion-based Conditional
  Generation
Dreamguider: Improved Training free Diffusion-based Conditional Generation
Nithin Gopalakrishnan Nair
Vishal M. Patel
81
3
0
04 Jun 2024
TabPedia: Towards Comprehensive Visual Table Understanding with Concept
  Synergy
TabPedia: Towards Comprehensive Visual Table Understanding with Concept Synergy
Weichao Zhao
Hao Feng
Qi Liu
Jingqun Tang
Shubo Wei
...
Lei Liao
Yongjie Ye
Hao Liu
Houqiang Li
Can Huang
LMTD
96
24
0
03 Jun 2024
Scene Graph Generation Strategy with Co-occurrence Knowledge and
  Learnable Term Frequency
Scene Graph Generation Strategy with Co-occurrence Knowledge and Learnable Term Frequency
H. Kim
Sangwon Kim
Dasom Ahn
Jong Taek Lee
ByoungChul Ko
106
4
0
21 May 2024
Referring Flexible Image Restoration
Referring Flexible Image Restoration
Runwei Guan
Rongsheng Hu
Zhuhao Zhou
Tianlang Xue
Ka Lok Man
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
76
0
0
16 Apr 2024
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics
  Perception
AesExpert: Towards Multi-modality Foundation Model for Image Aesthetics Perception
Yipo Huang
Xiangfei Sheng
Zhichao Yang
Quan Yuan
Zhichao Duan
Pengfei Chen
Leida Li
Weisi Lin
Guangming Shi
112
25
0
15 Apr 2024
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Semi-Supervised Image Captioning Considering Wasserstein Graph Matching
Yang Yang
87
0
0
26 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
295
403
0
21 Mar 2024
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing
  Objects in 3D Scenes
A Comprehensive Survey of 3D Dense Captioning: Localizing and Describing Objects in 3D Scenes
Ting Yu
Xiaojun Lin
Shuhui Wang
Weiguo Sheng
Qingming Huang
Jun-chen Yu
3DV
88
10
0
12 Mar 2024
VL-Trojan: Multimodal Instruction Backdoor Attacks against
  Autoregressive Visual Language Models
VL-Trojan: Multimodal Instruction Backdoor Attacks against Autoregressive Visual Language Models
Jiawei Liang
Siyuan Liang
Man Luo
Aishan Liu
Dongchen Han
Ee-Chien Chang
Xiaochun Cao
95
47
0
21 Feb 2024
Transfer Learning in Human Activity Recognition: A Survey
Transfer Learning in Human Activity Recognition: A Survey
Sourish Gunesh Dhekane
Thomas Ploetz
MUAI4TS
98
42
0
18 Jan 2024
AesBench: An Expert Benchmark for Multimodal Large Language Models on
  Image Aesthetics Perception
AesBench: An Expert Benchmark for Multimodal Large Language Models on Image Aesthetics Perception
Yipo Huang
Quan Yuan
Xiangfei Sheng
Zhichao Yang
Haoning Wu
Pengfei Chen
Yuzhe Yang
Leida Li
Weisi Lin
VLM
72
40
0
16 Jan 2024
12345
Next