ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04020
  4. Cited By
A Comprehensive Survey of Deep Learning for Image Captioning
v1v2 (latest)

A Comprehensive Survey of Deep Learning for Image Captioning

6 October 2018
Md Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
    VLM3DV
ArXiv (abs)PDFHTML

Papers citing "A Comprehensive Survey of Deep Learning for Image Captioning"

50 / 231 papers shown
Title
Safe Reinforcement Learning with Free-form Natural Language Constraints
  and Pre-Trained Language Models
Safe Reinforcement Learning with Free-form Natural Language Constraints and Pre-Trained Language Models
Xingzhou Lou
Junge Zhang
Ziyan Wang
Kaiqi Huang
Yali Du
73
3
0
15 Jan 2024
Jack of All Tasks, Master of Many: Designing General-purpose
  Coarse-to-Fine Vision-Language Model
Jack of All Tasks, Master of Many: Designing General-purpose Coarse-to-Fine Vision-Language Model
Shraman Pramanick
Guangxing Han
Rui Hou
Sayan Nag
Ser-Nam Lim
Nicolas Ballas
Qifan Wang
Rama Chellappa
Amjad Almahairi
VLMMLLM
167
36
0
19 Dec 2023
Assessing GPT4-V on Structured Reasoning Tasks
Assessing GPT4-V on Structured Reasoning Tasks
Mukul Singh
J. Cambronero
Sumit Gulwani
Vu Le
Gust Verbruggen
LRM
73
13
0
13 Dec 2023
User-Aware Prefix-Tuning is a Good Learner for Personalized Image
  Captioning
User-Aware Prefix-Tuning is a Good Learner for Personalized Image Captioning
Xuan Wang
Guanhong Wang
Wenhao Chai
Jiayu Zhou
Gaoang Wang
153
6
0
08 Dec 2023
OT-Attack: Enhancing Adversarial Transferability of Vision-Language
  Models via Optimal Transport Optimization
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Dongchen Han
Xiaojun Jia
Yang Bai
Jindong Gu
Yang Liu
Xiaochun Cao
VLM
86
26
0
07 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLMCLIP
124
90
0
06 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLMVLM
90
21
0
01 Dec 2023
Zero-shot Referring Expression Comprehension via Structural Similarity
  Between Images and Captions
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
145
12
0
28 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoVLRM
112
20
0
27 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
115
67
0
20 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
151
44
0
01 Nov 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLMMLLM
161
196
0
23 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TSSyDa
163
123
0
16 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
87
27
0
11 Oct 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
76
3
0
14 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
127
4
0
28 Aug 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
198
13
0
28 Aug 2023
Vision Relation Transformer for Unbiased Scene Graph Generation
Vision Relation Transformer for Unbiased Scene Graph Generation
Gopika Sudhakaran
Devendra Singh Dhami
Kristian Kersting
Stefan Roth
ViT
117
18
0
18 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Asynchronous Evolution of Deep Neural Network Architectures
J. Liang
Hormoz Shahrzad
Risto Miikkulainen
50
0
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
66
2
0
05 Aug 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning:
  A Survey
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
106
7
0
30 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming-Hsuan Yang
Fahad Shahbaz Khan
VLM
143
127
0
25 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention
  and Text Attributes
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
153
1
0
14 Jul 2023
MultiQG-TI: Towards Question Generation from Multi-modal Sources
MultiQG-TI: Towards Question Generation from Multi-modal Sources
Zichao Wang
Richard Baraniuk
41
5
0
07 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely
  Limited Labels
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
104
5
0
05 Jul 2023
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
  Graph Generation
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation
Qianji Di
Wenxing Ma
Zhongang Qi
Tianxiang Hou
Ying Shan
Hanzi Wang
52
0
0
23 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViTMedIm
89
207
0
11 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report
  Generation
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
87
11
0
09 Jun 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of
  BLIP-Family Models
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
56
1
0
31 May 2023
SciReviewGen: A Large-scale Dataset for Automatic Literature Review
  Generation
SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation
Tetsu Kasanishi
Masaru Isonuma
Junichiro Mori
Ichiro Sakata
66
9
0
24 May 2023
Image-to-Text Translation for Interactive Image Recognition: A
  Comparative User Study with Non-Expert Users
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users
Wataru Kawabe
Yusuke Sugano
VLM
59
2
0
11 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MAAI4TSAI4CE
99
58
0
10 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in
  Vietnamese
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
68
3
0
07 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
78
10
0
03 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
118
588
0
28 Apr 2023
Interpreting Vision and Language Generative Models with Semantic Visual
  Priors
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Michele Cafagna
L. Rojas-Barahona
Kees van Deemter
Albert Gatt
FAttVLM
59
3
0
28 Apr 2023
Focus on the Challenges: Analysis of a User-friendly Data Search
  Approach with CLIP in the Automotive Domain
Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain
Philipp Rigoll
Patrick Petersen
Hanno Stage
Lennart Ries
Eric Sax
57
5
0
20 Apr 2023
High-Throughput Vector Similarity Search in Knowledge Graphs
High-Throughput Vector Similarity Search in Knowledge Graphs
J. Mohoney
Anil Pacaci
S. R. Chowdhury
Ali Mousavi
Ihab F. Ilyas
U. F. Minhas
Jeffrey Pound
Theodoros Rekatsinas
70
32
0
04 Apr 2023
One Small Step for Generative AI, One Giant Leap for AGI: A Complete
  Survey on ChatGPT in AIGC Era
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
Chaoning Zhang
Chenshuang Zhang
Chenghao Li
Yu Qiao
Sheng Zheng
...
Sung-Ho Bae
Lik-Hang Lee
Pan Hui
In So Kweon
Choong Seon Hong
LM&MAAI4MHLRMELM
106
137
0
04 Apr 2023
Multimodal Shannon Game with Images
Multimodal Shannon Game with Images
Vilém Zouhar
Sunit Bhattacharya
Ondrej Bojar
25
1
0
20 Mar 2023
The Multimodal And Modular Ai Chef: Complex Recipe Generation From
  Imagery
The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
David Noever
S. M. Noever
403
7
0
20 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
127
2
0
19 Mar 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
  Generation
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
Mingjie Li
Bingqian Lin
Zicong Chen
Haokun Lin
Xiaodan Liang
Xiaojun Chang
MedIm
82
117
0
18 Mar 2023
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a
  Single Image using Diffusion Models
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
D. Kothandaraman
Dinesh Manocha
Ming Lin
Dinesh Manocha
72
5
0
15 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
89
21
0
07 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in
  Situated Dialogue
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
41
1
0
28 Feb 2023
Joint Task and Data Oriented Semantic Communications: A Deep Separate
  Source-channel Coding Scheme
Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme
Jianhao Huang
Dongxu Li
Chenyu Huang
Xiaoqi Qin
Wei Zhang
90
32
0
27 Feb 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
80
29
0
16 Feb 2023
MuG: A Multimodal Classification Benchmark on Game Data with Tabular,
  Textual, and Visual Fields
MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Jiaying Lu
Yongchen Qian
Shifan Zhao
Yuanzhe Xi
Carl Yang
VLM
74
4
0
06 Feb 2023
Interaction Order Prediction for Temporal Graphs
Interaction Order Prediction for Temporal Graphs
N. Bannur
Mashrin Srivastava
Harsha Vardhan
58
0
0
04 Feb 2023
Previous
12345
Next