Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.04020
Cited By
A Comprehensive Survey of Deep Learning for Image Captioning
6 October 2018
Md. Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Comprehensive Survey of Deep Learning for Image Captioning"
50 / 228 papers shown
Title
OT-Attack: Enhancing Adversarial Transferability of Vision-Language Models via Optimal Transport Optimization
Dongchen Han
Xiaojun Jia
Yang Bai
Jindong Gu
Yang Liu
Xiaochun Cao
VLM
30
22
0
07 Dec 2023
Alpha-CLIP: A CLIP Model Focusing on Wherever You Want
Zeyi Sun
Ye Fang
Tong Wu
Pan Zhang
Yuhang Zang
Shu Kong
Yuanjun Xiong
Dahua Lin
Jiaqi Wang
VLM
CLIP
51
83
0
06 Dec 2023
Segment and Caption Anything
Xiaoke Huang
Jianfeng Wang
Yansong Tang
Zheng Zhang
Han Hu
Jiwen Lu
Lijuan Wang
Zicheng Liu
MLLM
VLM
28
18
0
01 Dec 2023
Zero-shot Referring Expression Comprehension via Structural Similarity Between Images and Captions
Zeyu Han
Fangrui Zhu
Qianru Lao
Huaizu Jiang
ObjD
27
11
0
28 Nov 2023
EgoThink: Evaluating First-Person Perspective Thinking Capability of Vision-Language Models
Sijie Cheng
Zhicheng Guo
Jingwen Wu
Kechen Fang
Peng Li
Huaping Liu
Yang Liu
EgoV
LRM
31
16
0
27 Nov 2023
DocPedia: Unleashing the Power of Large Multimodal Model in the Frequency Domain for Versatile Document Understanding
Hao Feng
Qi Liu
Hao Liu
Wen-gang Zhou
Houqiang Li
Can Huang
VLM
25
60
0
20 Nov 2023
From Image to Language: A Critical Analysis of Visual Question Answering (VQA) Approaches, Challenges, and Opportunities
Md Farhan Ishmam
Md Sakib Hossain Shovon
M. F. Mridha
Nilanjan Dey
40
36
0
01 Nov 2023
HallusionBench: An Advanced Diagnostic Suite for Entangled Language Hallucination and Visual Illusion in Large Vision-Language Models
Tianrui Guan
Fuxiao Liu
Xiyang Wu
Ruiqi Xian
Zongxia Li
...
Lichang Chen
Furong Huang
Yaser Yacoob
Dinesh Manocha
Dinesh Manocha
VLM
MLLM
36
155
0
23 Oct 2023
Large Models for Time Series and Spatio-Temporal Data: A Survey and Outlook
Ming Jin
Qingsong Wen
Yuxuan Liang
Chaoli Zhang
Siqiao Xue
...
Shirui Pan
Vincent S. Tseng
Yu Zheng
Lei Chen
Hui Xiong
AI4TS
SyDa
35
117
0
16 Oct 2023
IMITATE: Clinical Prior Guided Hierarchical Vision-Language Pre-training
Che Liu
Sibo Cheng
Miaojing Shi
Anand Shah
Wenjia Bai
Rossella Arcucci
24
26
0
11 Oct 2023
SwitchGPT: Adapting Large Language Models for Non-Text Outputs
Xinyu Wang
Bohan Zhuang
Qi Wu
MLLM
27
3
0
14 Sep 2023
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
62
4
0
28 Aug 2023
Reinforcement Learning for Generative AI: A Survey
Yuanjiang Cao
Quan.Z Sheng
Julian McAuley
Lina Yao
SyDa
46
10
0
28 Aug 2023
Vision Relation Transformer for Unbiased Scene Graph Generation
Gopika Sudhakaran
Devendra Singh Dhami
Kristian Kersting
Stefan Roth
ViT
38
15
0
18 Aug 2023
Asynchronous Evolution of Deep Neural Network Architectures
J. Liang
H. Shahrzad
Risto Miikkulainen
28
0
0
08 Aug 2023
A Comprehensive Analysis of Real-World Image Captioning and Scene Identification
Sai Suprabhanu Nallapaneni
Subrahmanyam Konakanchi
30
2
0
05 Aug 2023
Synaptic Plasticity Models and Bio-Inspired Unsupervised Deep Learning: A Survey
Gabriele Lagani
Fabrizio Falchi
Claudio Gennaro
Giuseppe Amato
AAML
41
6
0
30 Jul 2023
Foundational Models Defining a New Era in Vision: A Survey and Outlook
Muhammad Awais
Muzammal Naseer
Salman Khan
Rao Muhammad Anwer
Hisham Cholakkal
M. Shah
Ming Yang
F. Khan
VLM
38
118
0
25 Jul 2023
AIC-AB NET: A Neural Network for Image Captioning with Spatial Attention and Text Attributes
Guoyun Tu
Ying Liu
Vladimir Vlassov
136
1
0
14 Jul 2023
MultiQG-TI: Towards Question Generation from Multi-modal Sources
Zichao Wang
Richard Baraniuk
20
5
0
07 Jul 2023
Multimodal Prompt Learning for Product Title Generation with Extremely Limited Labels
Bang-ju Yang
Fenglin Liu
Zheng Li
Qingyu Yin
Chenyu You
Bing Yin
Yuexian Zou
VLM
33
5
0
05 Jul 2023
Towards Unseen Triples: Effective Text-Image-joint Learning for Scene Graph Generation
Qianji Di
Wenxing Ma
Zhongang Qi
Tianxiang Hou
Ying Shan
Hanzi Wang
14
0
0
23 Jun 2023
A Comprehensive Survey on Applications of Transformers for Deep Learning Tasks
Saidul Islam
Hanae Elmekki
Ahmed Elsebai
Jamal Bentahar
Najat Drawel
Gaith Rjoub
Witold Pedrycz
ViT
MedIm
24
171
0
11 Jun 2023
Customizing General-Purpose Foundation Models for Medical Report Generation
Bang-ju Yang
Asif Raza
Yuexian Zou
Tong Zhang
MedIm
25
11
0
09 Jun 2023
Using Visual Cropping to Enhance Fine-Detail Question Answering of BLIP-Family Models
Jiarui Zhang
Mahyar Khayatkhoei
P. Chhikara
Filip Ilievski
27
1
0
31 May 2023
SciReviewGen: A Large-scale Dataset for Automatic Literature Review Generation
Tetsu Kasanishi
Masaru Isonuma
Junichiro Mori
Ichiro Sakata
18
8
0
24 May 2023
Image-to-Text Translation for Interactive Image Recognition: A Comparative User Study with Non-Expert Users
Wataru Kawabe
Yusuke Sugano
VLM
32
2
0
11 May 2023
ChatGPT-Like Large-Scale Foundation Models for Prognostics and Health Management: A Survey and Roadmaps
Yanfang Li
Huan Wang
Muxia Sun
LM&MA
AI4TS
AI4CE
29
46
0
10 May 2023
UIT-OpenViIC: A Novel Benchmark for Evaluating Image Captioning in Vietnamese
Doanh C. Bui
Nghia Hieu Nguyen
Khang Phuoc-Quy Nguyen
VLM
19
3
0
07 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
24
10
0
03 May 2023
LLaMA-Adapter V2: Parameter-Efficient Visual Instruction Model
Peng Gao
Jiaming Han
Renrui Zhang
Ziyi Lin
Shijie Geng
...
Pan Lu
Conghui He
Xiangyu Yue
Hongsheng Li
Yu Qiao
MLLM
38
550
0
28 Apr 2023
Interpreting Vision and Language Generative Models with Semantic Visual Priors
Michele Cafagna
L. Rojas-Barahona
Kees van Deemter
Albert Gatt
FAtt
VLM
17
1
0
28 Apr 2023
Focus on the Challenges: Analysis of a User-friendly Data Search Approach with CLIP in the Automotive Domain
Philipp Rigoll
Patrick Petersen
Hanno Stage
Lennart Ries
Eric Sax
21
5
0
20 Apr 2023
High-Throughput Vector Similarity Search in Knowledge Graphs
J. Mohoney
Anil Pacaci
S. R. Chowdhury
Ali Mousavi
Ihab F. Ilyas
U. F. Minhas
Jeffrey Pound
Theodoros Rekatsinas
25
25
0
04 Apr 2023
One Small Step for Generative AI, One Giant Leap for AGI: A Complete Survey on ChatGPT in AIGC Era
Chaoning Zhang
Chenshuang Zhang
Chenghao Li
Yu Qiao
Sheng Zheng
...
Sung-Ho Bae
Lik-Hang Lee
Pan Hui
In So Kweon
Choong Seon Hong
LM&MA
AI4MH
LRM
ELM
39
130
0
04 Apr 2023
Multimodal Shannon Game with Images
Vilém Zouhar
Sunit Bhattacharya
Ondrej Bojar
17
1
0
20 Mar 2023
The Multimodal And Modular Ai Chef: Complex Recipe Generation From Imagery
David A. Noever
S. M. Noever
130
6
0
20 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
44
2
0
19 Mar 2023
Dynamic Graph Enhanced Contrastive Learning for Chest X-ray Report Generation
Mingjie Li
Bingqian Lin
Zicong Chen
Haokun Lin
Xiaodan Liang
Xiaojun Chang
MedIm
20
106
0
18 Mar 2023
Aerial Diffusion: Text Guided Ground-to-Aerial View Translation from a Single Image using Diffusion Models
D. Kothandaraman
Dinesh Manocha
Ming Lin
Dinesh Manocha
24
5
0
15 Mar 2023
Graph Neural Networks in Vision-Language Image Understanding: A Survey
Henry Senior
Greg Slabaugh
Shanxin Yuan
Luca Rossi
GNN
33
13
0
07 Mar 2023
Which One Are You Referring To? Multimodal Object Identification in Situated Dialogue
Holy Lovenia
Samuel Cahyawijaya
Pascale Fung
11
1
0
28 Feb 2023
Joint Task and Data Oriented Semantic Communications: A Deep Separate Source-channel Coding Scheme
Jianhao Huang
Dongxu Li
Chenyu Huang
Xiaoqi Qin
Wei Zhang
20
31
0
27 Feb 2023
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
29
29
0
16 Feb 2023
MuG: A Multimodal Classification Benchmark on Game Data with Tabular, Textual, and Visual Fields
Jiaying Lu
Yongchen Qian
Shifan Zhao
Yuanzhe Xi
Carl Yang
VLM
27
3
0
06 Feb 2023
Interaction Order Prediction for Temporal Graphs
N. Bannur
Mashrin Srivastava
Harsha Vardhan
29
0
0
04 Feb 2023
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning
Jingqiang Chen
25
3
0
04 Feb 2023
A data science and machine learning approach to continuous analysis of Shakespeare's plays
Charles F. Swisher
L. Shamir
35
3
0
15 Jan 2023
An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation
Kevin Moran
Ali Yachnes
George Purnell
Juanyed Mahmud
Michele Tufano
Carlos Bernal-Cárdenas
Denys Poshyvanyk
Zach H’Doubler
31
10
0
03 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
33
16
0
26 Dec 2022
Previous
1
2
3
4
5
Next