Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
v1
v2
v3 (latest)
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,520 papers shown
Title
HAAV: Hierarchical Aggregation of Augmented Views for Image Captioning
Chia-Wen Kuo
Z. Kira
85
23
0
25 May 2023
TOAST: Transfer Learning via Attention Steering
Baifeng Shi
Siyu Gai
Trevor Darrell
Xin Wang
75
13
0
24 May 2023
Evolutionary Algorithms in the Light of SGD: Limit Equivalence, Minima Flatness, and Transfer Learning
Andrei Kucharavy
R. Guerraoui
Ljiljana Dolamic
109
1
0
20 May 2023
DiffCap: Exploring Continuous Diffusion on Image Captioning
Yufeng He
Zefan Cai
Xu Gan
Baobao Chang
DiffM
82
7
0
20 May 2023
Explaining V1 Properties with a Biologically Constrained Deep Learning Architecture
Galen Pogoncheff
Jacob Granley
M. Beyeler
AAML
FAtt
62
10
0
18 May 2023
Emergent Communication with Attention
Ryokan Ri
Ryo Ueda
Jason Naradowsky
66
2
0
18 May 2023
A Video Is Worth 4096 Tokens: Verbalize Videos To Understand Them In Zero Shot
Aanisha Bhattacharya
Yaman Kumar Singla
Balaji Krishnamurthy
R. Shah
Changyou Chen
VGen
71
12
0
16 May 2023
PLIP: Language-Image Pre-training for Person Representation Learning
Jia-li Zuo
Jiahao Hong
Feng Zhang
Changqian Yu
Hanyu Zhou
Changxin Gao
Nong Sang
Jingdong Wang
VLM
MLLM
136
38
0
15 May 2023
Mask to reconstruct: Cooperative Semantics Completion for Video-text Retrieval
Han Fang
Zhifei Yang
Xianghao Zang
Chao Ban
Hao Sun
VGen
72
3
0
13 May 2023
Automatic Radiology Report Generation by Learning with Increasingly Hard Negatives
Bhanu Prakash Voutharoja
Lei Wang
Luping Zhou
MedIm
63
8
0
11 May 2023
Learning the Visualness of Text Using Large Vision-Language Models
Gaurav Verma
Ryan Rossi
Chris Tensmeyer
Jiuxiang Gu
A. Nenkova
VLM
71
0
0
11 May 2023
Clothes-Invariant Feature Learning by Causal Intervention for Clothes-Changing Person Re-identification
Xulin Li
Yan Lu
B. Liu
Yuenan Hou
Yating Liu
Qi Chu
Wanli Ouyang
Nenghai Yu
OOD
CML
77
5
0
10 May 2023
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
82
82
0
09 May 2023
Image Captioners Sometimes Tell More Than Images They See
Honori Udo
Takafumi Koshinaka
VLM
27
4
0
04 May 2023
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
Shun-cheng Wu
Keisuke Tateno
Nassir Navab
F. Tombari
3DPC
3DV
121
21
0
04 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
193
89
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
101
21
0
03 May 2023
Fairness in AI Systems: Mitigating gender bias from language-vision models
Lavisha Aggarwal
Shruti Bhargava
70
5
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
78
10
0
03 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
88
9
0
30 Apr 2023
Multi-Modality Deep Network for Extreme Learned Image Compression
Xuhao Jiang
Weimin Tan
Tian Tan
Bo Yan
Liquan Shen
32
18
0
26 Apr 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Min Zhang
Fatih Porikli
3DV
121
22
0
22 Apr 2023
Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks
Isabell Lederer
Rudolf Mayer
Andreas Rauber
98
19
0
22 Apr 2023
Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search
Andrei Kucharavy
M. Monti
R. Guerraoui
Ljiljana Dolamic
64
1
0
20 Apr 2023
TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection
Quanjiang Guo
Zhao Kang
Ling Tian
Zhouguo Chen
77
10
0
19 Apr 2023
Interactive and Explainable Region-guided Radiology Report Generation
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
148
117
0
17 Apr 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Yang Liu
Guanbin Li
Jingzhou Luo
Liang Lin
BDL
LRM
105
5
0
17 Apr 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
75
52
0
12 Apr 2023
Learning Transferable Pedestrian Representation from Multimodal Information Supervision
Li-Na Bao
Longhui Wei
Xiaoyu Qiu
Wen-gang Zhou
Houqiang Li
Qi Tian
SSL
75
5
0
12 Apr 2023
ImageCaptioner
2
^2
2
: Image Captioner for Image Captioning Bias Amplification Assessment
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
48
6
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
128
79
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
122
18
0
07 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
Marcel Worring
OOD
AAML
87
6
0
06 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
77
76
0
05 Apr 2023
Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
Osman Tursun
Simon Denman
Sridha Sridharan
Clinton Fookes
ViT
VLM
56
6
0
05 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
101
19
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
Shizhen Chang
Pedram Ghamisi
95
48
0
03 Apr 2023
SARGAN: Spatial Attention-based Residuals for Facial Expression Manipulation
Arbish Akram
Nazar Khan
GAN
CVBM
122
10
0
30 Mar 2023
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
Zhengqing Miao
Xin Zhang
Mei-rong Zhao
Dong Ming
44
6
0
29 Mar 2023
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee
Bedrich Benes
ViT
67
0
0
28 Mar 2023
Medical Image Analysis using Deep Relational Learning
Zhi-Hu Liu
MedIm
64
0
0
28 Mar 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
73
33
0
28 Mar 2023
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Yiwei Ma
Xiaioqing Zhang
Xiaoshuai Sun
Jiayi Ji
Haowei Wang
Guannan Jiang
Weilin Zhuang
Rongrong Ji
101
40
0
28 Mar 2023
Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives
Shunsuke Kitada
FaML
HAI
AI4CE
68
1
0
24 Mar 2023
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang
Yawei Luo
Zhiqin Chen
Tao Jiang
Lei Chen
Yi Yang
Jun Xiao
78
8
0
23 Mar 2023
PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds
Yun-Hai Liu
Xu Yan
Zhilei Chen
Zhiqi Li
Zeyong Wei
Mingqiang Wei
3DPC
90
2
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
88
32
0
23 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
86
60
0
21 Mar 2023
Context De-confounded Emotion Recognition
Dingkang Yang
Zhaoyu Chen
Yuzheng Wang
Shunli Wang
Mingcheng Li
...
Xiao Zhao
Shuai Huang
Zhiyan Dong
Peng Zhai
Lihua Zhang
CML
107
45
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
190
170
0
21 Mar 2023
Previous
1
2
3
...
8
9
10
...
69
70
71
Next