Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1810.04020
Cited By
v1
v2 (latest)
A Comprehensive Survey of Deep Learning for Image Captioning
6 October 2018
Md Zakir Hossain
Ferdous Sohel
M. Shiratuddin
Hamid Laga
VLM
3DV
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Comprehensive Survey of Deep Learning for Image Captioning"
50 / 231 papers shown
Title
Transform, Contrast and Tell: Coherent Entity-Aware Multi-Image Captioning
Jingqiang Chen
59
4
0
04 Feb 2023
A data science and machine learning approach to continuous analysis of Shakespeare's plays
Charles F. Swisher
L. Shamir
58
3
0
15 Jan 2023
An Empirical Investigation into the Use of Image Captioning for Automated Software Documentation
Kevin Moran
Ali Yachnes
George Purnell
Juanyed Mahmud
Michele Tufano
Carlos Bernal-Cárdenas
Denys Poshyvanyk
Zach H’Doubler
85
11
0
03 Jan 2023
VQA and Visual Reasoning: An Overview of Recent Datasets, Methods and Challenges
R. Zakari
Jim Wilson Owusu
Hailin Wang
Ke Qin
Zaharaddeen Karami Lawal
Yue-hong Dong
LRM
73
16
0
26 Dec 2022
Do DALL-E and Flamingo Understand Each Other?
Hang Li
Jindong Gu
Rajat Koner
Sahand Sharifzadeh
Volker Tresp
MLLM
77
12
0
23 Dec 2022
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
79
2
0
05 Dec 2022
PLA: Language-Driven Open-Vocabulary 3D Scene Understanding
Runyu Ding
Jihan Yang
Chuhui Xue
Wenqing Zhang
Song Bai
Xiaojuan Qi
VLM
80
154
0
29 Nov 2022
Deep representation learning: Fundamentals, Perspectives, Applications, and Open Challenges
K. T. Baghaei
Amirreza Payandeh
Pooya Fayyazsanavi
Shahram Rahimi
Zhiqian Chen
Somayeh Bakhtiari Ramezani
FaML
AI4TS
69
6
0
27 Nov 2022
Aesthetically Relevant Image Captioning
Zhipeng Zhong
Fei Zhou
Guoping Qiu
62
9
0
25 Nov 2022
Feedback is Needed for Retakes: An Explainable Poor Image Notification Framework for the Visually Impaired
Kazuya Ohata
Shunsuke Kitada
Hitoshi Iyatomi
63
0
0
17 Nov 2022
CapEnrich: Enriching Caption Semantics for Web Images via Cross-modal Pre-trained Knowledge
Linli Yao
Wei Chen
Qin Jin
VLM
121
11
0
17 Nov 2022
Vis2Mus: Exploring Multimodal Representation Mapping for Controllable Music Generation
Runbang Zhang
Yixiao Zhang
Kai Shao
Ying Shan
Gus Xia
61
4
0
10 Nov 2022
CLSE: Corpus of Linguistically Significant Entities
A. Chuklin
Justin Zhao
Mihir Kale
62
1
0
04 Nov 2022
Physical Adversarial Attack meets Computer Vision: A Decade Survey
Hui Wei
Hao Tang
Xuemei Jia
Zhixiang Wang
Han-Bing Yu
Zhubo Li
Shiníchi Satoh
Luc Van Gool
Zheng Wang
AAML
138
56
0
30 Sep 2022
M^4I: Multi-modal Models Membership Inference
Pingyi Hu
Zihan Wang
Ruoxi Sun
Hu Wang
Minhui Xue
97
27
0
15 Sep 2022
Cross Modal Compression: Towards Human-comprehensible Semantic Compression
Jiguo Li
Chuanmin Jia
Xinfeng Zhang
Siwei Ma
Wen Gao
35
21
0
06 Sep 2022
Facial Expression Recognition and Image Description Generation in Vietnamese
Khang Nhut Lam
Kim Thi-Thanh Nguyen
Loc Huu Nguy
Jugal Kalita
3DH
CVBM
57
1
0
12 Aug 2022
A Comprehensive Survey of Natural Language Generation Advances from the Perspective of Digital Deception
Keenan I. Jones
Enes ALTUNCU
V. N. Franqueira
Yi-Chia Wang
Shujun Li
DeLMO
75
3
0
11 Aug 2022
End-to-end deep learning for directly estimating grape yield from ground-based imagery
A. Olenskyj
B. Sams
Zhenghao Fei
Vishal Singh
P. Raja
G. Bornhorst
J. M. Earles
59
28
0
04 Aug 2022
Visual Recognition by Request
Chufeng Tang
Lingxi Xie
Xiaopeng Zhang
Xiaolin Hu
Qi Tian
VLM
93
15
0
28 Jul 2022
Controllable Data Generation by Deep Learning: A Review
Shiyu Wang
Yuanqi Du
Xiaojie Guo
Bo Pan
Zhaohui Qin
Liang Zhao
97
28
0
19 Jul 2022
Relational Future Captioning Model for Explaining Likely Collisions in Daily Tasks
Motonari Kambara
K. Sugiura
53
6
0
19 Jul 2022
Exploring Adversarial Examples and Adversarial Robustness of Convolutional Neural Networks by Mutual Information
Jiebao Zhang
Wenhua Qian
Ren-qi Nie
Jinde Cao
Dan Xu
GAN
AAML
61
0
0
12 Jul 2022
Vision-and-Language Pretraining
Thong Nguyen
Cong-Duy Nguyen
Xiaobao Wu
See-Kiong Ng
Anh Tuan Luu
VLM
CLIP
55
2
0
05 Jul 2022
Gender Artifacts in Visual Datasets
Nicole Meister
Dora Zhao
Angelina Wang
V. V. Ramaswamy
Ruth C. Fong
Olga Russakovsky
70
29
0
18 Jun 2022
Image Captioning based on Feature Refinement and Reflective Decoding
G. Alabduljabbar
Hafida Benhidour
Said Kerrache
3DV
24
3
0
16 Jun 2022
Video-based Human-Object Interaction Detection from Tubelet Tokens
Danyang Tu
Wei Sun
Xiongkuo Min
Guangtao Zhai
Wei Shen
ViT
95
17
0
04 Jun 2022
A Generative Adversarial Network-based Selective Ensemble Characteristic-to-Expression Synthesis (SE-CTES) Approach and Its Applications in Healthcare
Yuxuan Li
Ying-Jia Lin
Chenang Liu
50
0
0
29 May 2022
Prompt-based Learning for Unpaired Image Captioning
Peipei Zhu
Tianlin Li
Lin Zhu
Zhenglong Sun
Weishi Zheng
Yaowei Wang
Chen Chen
VLM
97
33
0
26 May 2022
Beyond Greedy Search: Tracking by Multi-Agent Reinforcement Learning-based Beam Search
Tianlin Li
Zhe Chen
Bo Jiang
Jin Tang
Bin Luo
Dacheng Tao
97
19
0
19 May 2022
Efficient Gesture Recognition for the Assistance of Visually Impaired People using Multi-Head Neural Networks
Samer Alashhab
Antonio Javier Gallego
Miguel Ángel Lozano
40
18
0
14 May 2022
Translation between Molecules and Natural Language
Carl Edwards
T. Lai
Kevin Ros
Garrett Honke
Kyunghyun Cho
Heng Ji
136
171
0
25 Apr 2022
Visual Attention Methods in Deep Learning: An In-Depth Survey
Mohammed Hassanin
Saeed Anwar
Ibrahim Radwan
Fahad Shahbaz Khan
Ajmal Mian
134
166
0
16 Apr 2022
Guiding Attention using Partial-Order Relationships for Image Captioning
Murad Popattia
Muhammad Rafi
Rizwan Qureshi
Shah Nawaz
52
5
0
15 Apr 2022
Image Captioning In the Transformer Age
Yangliu Xu
Li Li
Haiyang Xu
Songfang Huang
Fei Huang
Jianfei Cai
ViT
59
6
0
15 Apr 2022
Vision Transformers in Medical Computer Vision -- A Contemplative Retrospection
Arshi Parvaiz
Muhammad Anwaar Khalid
Rukhsana Zafar
Huma Ameer
M. Ali
M. Fraz
MedIm
73
63
0
29 Mar 2022
Interactive Robotic Grasping with Attribute-Guided Disambiguation
Yang Yang
Xibai Lou
Changhyun Choi
82
30
0
15 Mar 2022
Unpaired Image Captioning by Image-level Weakly-Supervised Visual Concept Recognition
Peipei Zhu
Tianlin Li
Yong Luo
Zhenglong Sun
Wei-Shi Zheng
Yaowei Wang
Chen Chen
102
12
0
07 Mar 2022
A Review of Emerging Research Directions in Abstract Visual Reasoning
Mikolaj Malkiñski
Jacek Mańdziuk
96
41
0
21 Feb 2022
ACORT: A Compact Object Relation Transformer for Parameter Efficient Image Captioning
J. Tan
Y. Tan
C. Chan
Joon Huang Chuah
VLM
ViT
77
19
0
11 Feb 2022
Deep Learning Approaches on Image Captioning: A Review
Taraneh Ghandi
H. Pourreza
H. Mahyar
VLM
130
101
0
31 Jan 2022
A Frustratingly Simple Approach for End-to-End Image Captioning
Ziyang Luo
Yadong Xi
Rongsheng Zhang
Jing Ma
VLM
MLLM
70
16
0
30 Jan 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
64
0
0
28 Jan 2022
Beyond Simple Meta-Learning: Multi-Purpose Models for Multi-Domain, Active and Continual Few-Shot Learning
Peyman Bateni
Jarred Barber
Raghav Goyal
Vaden Masrani
Jan-Willem van de Meent
Leonid Sigal
Frank Wood
BDL
VLM
93
9
0
13 Jan 2022
Technical Language Supervision for Intelligent Fault Diagnosis in Process Industry
Karl Lowenmark
C. Taal
S. Schnabel
Marcus Liwicki
Fredrik Sandin
45
7
0
11 Dec 2021
Multimodal Fake News Detection
Santiago Alonso-Bartolome
Isabel Segura-Bedmar
68
67
0
09 Dec 2021
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
86
47
0
29 Nov 2021
Multi-Glimpse Network: A Robust and Efficient Classification Architecture based on Recurrent Downsampled Attention
S. Tan
Runpei Dong
Kaisheng Ma
74
2
0
03 Nov 2021
Deep Learning in Human Activity Recognition with Wearable Sensors: A Review on Advances
Shibo Zhang
Yaxuan Li
Shen Zhang
Farzad Shahabi
S. Xia
Yuanbei Deng
N. Alshurafa
BDL
81
317
0
31 Oct 2021
End-to-End Supermask Pruning: Learning to Prune Image Captioning Models
J. Tan
C. Chan
Joon Huang Chuah
VLM
124
16
0
07 Oct 2021
Previous
1
2
3
4
5
Next