ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1810.04020
  4. Cited By
A Comprehensive Survey of Deep Learning for Image Captioning

A Comprehensive Survey of Deep Learning for Image Captioning

6 October 2018
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
    VLM
    3DV
ArXivPDFHTML

Papers citing "A Comprehensive Survey of Deep Learning for Image Captioning"

50 / 228 papers shown
Title
OT-Attack: Enhancing Adversarial Transferability of Vision-Language
  Models via Optimal Transport Optimization
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Dongchen Han
Xiaojun Jia
Yang Bai
Jindong Gu
Yang Liu
Xiaochun Cao
VLM
30
22
0
07 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
51
83
0
06 Dec 2023
Segment and Caption Anything
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
28
18
0
01 Dec 2023
Zero-shot Referring Expression Comprehension via Structural Similarity
  Between Images and Captions
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
27
11
0
28 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of
  Vision-Language Models
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoV
LRM
31
16
0
27 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the
  Frequency Domain for Versatile Document Understanding
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
25
60
0
20 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering
  (VQA) Approaches, Challenges, and Opportunities
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
40
36
0
01 Nov 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language
  Hallucination and Visual Illusion in Large Vision-Language Models
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
36
155
0
23 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and
  Outlook
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
117
0
16 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
24
26
0
11 Oct 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
27
3
0
14 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
62
4
0
28 Aug 2023
Reinforcement Learning for Generative AI: A Survey
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
Vision Relation Transformer for Unbiased Scene Graph Generation
Vision Relation Transformer for Unbiased Scene Graph Generation
Gopika Sudhakaran
Devendra Singh Dhami
Kristian Kersting
Stefan Roth
ViT
38
15
0
18 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
Asynchronous Evolution of Deep Neural Network Architectures
J. Liang
H. Shahrzad
Risto Miikkulainen
28
0
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene
  Identification
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning:
  A Survey
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
41
6
0
30 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
38
118
0
25 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention
  and Text Attributes
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
136
1
0
14 Jul 2023
MultiQG-TI: Towards Question Generation from Multi-modal Sources
MultiQG-TI: Towards Question Generation from Multi-modal Sources
Zichao Wang
Richard Baraniuk
20
5
0
07 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely
  Limited Labels
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
33
5
0
05 Jul 2023
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene
  Graph Generation
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation
Qianji Di
Wenxing Ma
Zhongang Qi
Tianxiang Hou
Ying Shan
Hanzi Wang
14
0
0
23 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning
  Tasks
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
24
171
0
11 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report
  Generation
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
25
11
0
09 Jun 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of
  BLIP-Family Models
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
27
1
0
31 May 2023
SciReviewGen: A Large-scale Dataset for Automatic Literature Review
  Generation
SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation
Tetsu Kasanishi
Masaru Isonuma
Junichiro Mori
Ichiro Sakata
18
8
0
24 May 2023
Image-to-Text Translation for Interactive Image Recognition: A
  Comparative User Study with Non-Expert Users
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users
Wataru Kawabe
Yusuke Sugano
VLM
32
2
0
11 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health
  Management: A Survey and Roadmaps
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
29
46
0
10 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in
  Vietnamese
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
19
3
0
07 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
24
10
0
03 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
38
550
0
28 Apr 2023
Interpreting Vision and Language Generative Models with Semantic Visual
  Priors
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Michele Cafagna
L. Rojas-Barahona
Kees van Deemter
Albert Gatt
FAtt
VLM
17
1
0
28 Apr 2023
Focus on the Challenges: Analysis of a User-friendly Data Search
  Approach with CLIP in the Automotive Domain
Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain
Philipp Rigoll
Patrick Petersen
Hanno Stage
Lennart Ries
Eric Sax
21
5
0
20 Apr 2023
High-Throughput Vector Similarity Search in Knowledge Graphs
High-Throughput Vector Similarity Search in Knowledge Graphs
J. Mohoney
Anil Pacaci
S. R. Chowdhury
Ali Mousavi
Ihab F. Ilyas
U. F. Minhas
Jeffrey Pound
Theodoros Rekatsinas
25
25
0
04 Apr 2023
One Small Step for Generative AI, One Giant Leap for AGI: A Complete
  Survey on ChatGPT in AIGC Era
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
Chaoning Zhang
Chenshuang Zhang
Chenghao Li
Yu Qiao
Sheng Zheng
...
Sung-Ho Bae
Lik-Hang Lee
Pan Hui
In So Kweon
Choong Seon Hong
LM&MA
AI4MH
LRM
ELM
39
130
0
04 Apr 2023
Multimodal Shannon Game with Images
Multimodal Shannon Game with Images
Vilém Zouhar
Sunit Bhattacharya
Ondrej Bojar
17
1
0
20 Mar 2023
The Multimodal And Modular Ai Chef: Complex Recipe Generation From
  Imagery
The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
David A. Noever
S. M. Noever
130
6
0
20 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
44
2
0
19 Mar 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report
  Generation
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
Mingjie Li
Bingqian Lin
Zicong Chen
Haokun Lin
Xiaodan Liang
Xiaojun Chang
MedIm
20
106
0
18 Mar 2023
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a
  Single Image using Diffusion Models
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
D. Kothandaraman
Dinesh Manocha
Ming Lin
Dinesh Manocha
24
5
0
15 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
33
13
0
07 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in
  Situated Dialogue
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
11
1
0
28 Feb 2023
Joint Task and Data Oriented Semantic Communications: A Deep Separate
  Source-channel Coding Scheme
Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme
Jianhao Huang
Dongxu Li
Chenyu Huang
Xiaoqi Qin
Wei Zhang
20
31
0
27 Feb 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
29
29
0
16 Feb 2023
MuG: A Multimodal Classification Benchmark on Game Data with Tabular,
  Textual, and Visual Fields
MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Jiaying Lu
Yongchen Qian
Shifan Zhao
Yuanzhe Xi
Carl Yang
VLM
27
3
0
06 Feb 2023
Interaction Order Prediction for Temporal Graphs
Interaction Order Prediction for Temporal Graphs
N. Bannur
Mashrin Srivastava
Harsha Vardhan
29
0
0
04 Feb 2023
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image
  Captioning
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning
Jingqiang Chen
25
3
0
04 Feb 2023
A data science and machine learning approach to continuous analysis of
  Shakespeare's plays
A data science and machine learning approach to continuous analysis of Shakespeare's plays
Charles F. Swisher
L. Shamir
35
3
0
15 Jan 2023
An Empirical Investigation into the Use of Image Captioning for
  Automated Software Documentation
An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation
Kevin Moran
Ali Yachnes
George Purnell
Juanyed Mahmud
Michele Tufano
Carlos Bernal-Cárdenas
Denys Poshyvanyk
Zach H’Doubler
31
10
0
03 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and
  Challenges
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
33
16
0
26 Dec 2022
Previous
12345
Next