Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1502.03044
Cited By
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention
10 February 2015
Ke Xu
Jimmy Ba
Ryan Kiros
Kyunghyun Cho
Aaron Courville
Ruslan Salakhutdinov
R. Zemel
Yoshua Bengio
DiffM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Show, Attend and Tell: Neural Image Caption Generation with Visual Attention"
50 / 3,508 papers shown
Title
Vision-Language Models in Remote Sensing: Current Progress and Future Trends
Xiang Li
Congcong Wen
Yuan Hu
Zhenghang Yuan
Xiao Xiang Zhu
VLM
24
71
0
09 May 2023
Image Captioners Sometimes Tell More Than Images They See
Honori Udo
Takafumi Koshinaka
VLM
17
4
0
04 May 2023
Incremental 3D Semantic Scene Graph Prediction from RGB Sequences
Shun-cheng Wu
Keisuke Tateno
Nassir Navab
F. Tombari
3DPC
3DV
48
21
0
04 May 2023
Caption Anything: Interactive Image Description with Diverse Multimodal Controls
Teng Wang
Jinrui Zhang
Junjie Fei
Hao Zheng
Yunlong Tang
Zhe Li
Mingqi Gao
Shanshan Zhao
MLLM
104
82
0
04 May 2023
Transforming Visual Scene Graphs to Image Captions
Xu Yang
Jiawei Peng
Zihua Wang
Haiyang Xu
Qinghao Ye
Chenliang Li
Mingshi Yan
Feisi Huang
Zhangzikang Li
Yu Zhang
49
19
0
03 May 2023
Fairness in AI Systems: Mitigating gender bias from language-vision models
Lavisha Aggarwal
Shruti Bhargava
19
4
0
03 May 2023
Multimodal Data Augmentation for Image Captioning using Diffusion Models
Changrong Xiao
S. Xu
Kunpeng Zhang
DiffM
27
10
0
03 May 2023
Multimodal Graph Transformer for Multimodal Question Answering
Xuehai He
Xin Eric Wang
36
7
0
30 Apr 2023
Multi-Modality Deep Network for Extreme Learned Image Compression
Xuhao Jiang
Weimin Tan
Tian Tan
Bo Yan
Liquan Shen
19
17
0
26 Apr 2023
A Review of Deep Learning for Video Captioning
Moloud Abdar
Meenakshi Kollati
Swaraja Kuraparthi
Farhad Pourpanah
Daniel J. McDuff
...
Shuicheng Yan
Abduallah A. Mohamed
Abbas Khosravi
Min Zhang
Fatih Porikli
3DV
40
21
0
22 Apr 2023
Identifying Appropriate Intellectual Property Protection Mechanisms for Machine Learning Models: A Systematization of Watermarking, Fingerprinting, Model Access, and Attacks
Isabell Lederer
Rudolf Mayer
Andreas Rauber
29
19
0
22 Apr 2023
Byzantine-Resilient Learning Beyond Gradients: Distributing Evolutionary Search
Andrei Kucharavy
M. Monti
R. Guerraoui
Ljiljana Dolamic
40
1
0
20 Apr 2023
TieFake: Title-Text Similarity and Emotion-Aware Fake News Detection
Quanjiang Guo
Zhao Kang
Ling Tian
Zhouguo Chen
30
10
0
19 Apr 2023
Interactive and Explainable Region-guided Radiology Report Generation
Tim Tanida
Philip Muller
Georgios Kaissis
Daniel Rueckert
MedIm
43
110
0
17 Apr 2023
VCD: Visual Causality Discovery for Cross-Modal Question Reasoning
Yang Liu
Guanbin Li
Jingzhou Luo
Liang Lin
BDL
LRM
51
5
0
17 Apr 2023
CLIP-Guided Vision-Language Pre-training for Question Answering in 3D Scenes
Maria Parelli
Alexandros Delitzas
Nikolas Hars
G. Vlassis
Sotiris Anagnostidis
Gregor Bachmann
Thomas Hofmann
CLIP
20
50
0
12 Apr 2023
Learning Transferable Pedestrian Representation from Multimodal Information Supervision
Li-Na Bao
Longhui Wei
Xiaoyu Qiu
Wen-gang Zhou
Houqiang Li
Qi Tian
SSL
39
5
0
12 Apr 2023
ImageCaptioner
2
^2
2
: Image Captioner for Image Captioning Bias Amplification Assessment
Eslam Mohamed Bakr
Pengzhan Sun
Erran L. Li
Mohamed Elhoseiny
22
6
0
10 Apr 2023
DetCLIPv2: Scalable Open-Vocabulary Object Detection Pre-training via Word-Region Alignment
Lewei Yao
Jianhua Han
Xiaodan Liang
Danqian Xu
Wei Zhang
Zhenguo Li
Hang Xu
VLM
ObjD
CLIP
56
74
0
10 Apr 2023
Model-Agnostic Gender Debiased Image Captioning
Yusuke Hirota
Yuta Nakashima
Noa Garcia
FaML
35
18
0
07 Apr 2023
Improving Visual Question Answering Models through Robustness Analysis and In-Context Learning with a Chain of Basic Questions
Jia-Hong Huang
Modar Alfadly
Guohao Li
M. Worring
OOD
AAML
44
5
0
06 Apr 2023
METransformer: Radiology Report Generation by Transformer with Multiple Learnable Expert Tokens
Zhanyu Wang
Lingqiao Liu
Lei Wang
Luping Zhou
MedIm
13
71
0
05 Apr 2023
Towards Self-Explainability of Deep Neural Networks with Heatmap Captioning and Large-Language Models
Osman Tursun
Simon Denman
Sridha Sridharan
Clinton Fookes
ViT
VLM
16
6
0
05 Apr 2023
Cross-Domain Image Captioning with Discriminative Finetuning
Roberto Dessì
Michele Bevilacqua
Eleonora Gualdoni
Nathanaël Carraz Rakotonirina
Francesca Franzon
Marco Baroni
CLIP
27
19
0
04 Apr 2023
Changes to Captions: An Attentive Network for Remote Sensing Change Captioning
Shizhen Chang
Pedram Ghamisi
30
43
0
03 Apr 2023
SARGAN: Spatial Attention-based Residuals for Facial Expression Manipulation
Arbish Akram
Nazar Khan
GAN
CVBM
30
10
0
30 Mar 2023
LMDA-Net:A lightweight multi-dimensional attention network for general EEG-based brain-computer interface paradigms and interpretability
Zhengqing Miao
Xin Zhang
Mei-rong Zhao
Dong Ming
24
6
0
29 Mar 2023
SnakeVoxFormer: Transformer-based Single Image\\Voxel Reconstruction with Run Length Encoding
Jae Joong Lee
Bedrich Benes
ViT
32
0
0
28 Mar 2023
Medical Image Analysis using Deep Relational Learning
Zhi-Hu Liu
MedIm
17
0
0
28 Mar 2023
Unify, Align and Refine: Multi-Level Semantic Alignment for Radiology Report Generation
Yaowei Li
Bang-ju Yang
Xuxin Cheng
Zhihong Zhu
Hongxiang Li
Yuexian Zou
27
31
0
28 Mar 2023
X-Mesh: Towards Fast and Accurate Text-driven 3D Stylization via Dynamic Textual Guidance
Yiwei Ma
Xiaioqing Zhang
Xiaoshuai Sun
Jiayi Ji
Haowei Wang
Guannan Jiang
Weilin Zhuang
Rongrong Ji
23
39
0
28 Mar 2023
Improving Prediction Performance and Model Interpretability through Attention Mechanisms from Basic and Applied Research Perspectives
Shunsuke Kitada
FaML
HAI
AI4CE
32
0
0
24 Mar 2023
Taking A Closer Look at Visual Relation: Unbiased Video Scene Graph Generation with Decoupled Label Learning
Wenqing Wang
Yawei Luo
Zhiqin Chen
Tao Jiang
Lei Chen
Yi Yang
Jun Xiao
35
7
0
23 Mar 2023
PointGame: Geometrically and Adaptively Masked Auto-Encoder on Point Clouds
Yun-Hai Liu
Xu Yan
Zhilei Chen
Zhiqi Li
Zeyong Wei
Mingqiang Wei
3DPC
27
2
0
23 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
25
29
0
23 Mar 2023
Positive-Augmented Contrastive Learning for Image and Video Captioning Evaluation
Sara Sarto
Manuele Barraco
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
21
55
0
21 Mar 2023
Context De-confounded Emotion Recognition
Dingkang Yang
Zhaoyu Chen
Yuzheng Wang
Shunli Wang
Mingcheng Li
...
Xiao Zhao
Shuai Huang
Zhiyan Dong
Peng Zhai
Lihua Zhang
CML
21
40
0
21 Mar 2023
A Complete Survey on Generative AI (AIGC): Is ChatGPT from GPT-4 to GPT-5 All You Need?
Chaoning Zhang
Chenshuang Zhang
Sheng Zheng
Yu Qiao
Chenghao Li
...
Lik-Hang Lee
Yang Yang
Heng Tao Shen
In So Kweon
Choong Seon Hong
85
159
0
21 Mar 2023
Sketch2Saliency: Learning to Detect Salient Objects from Human Drawings
A. Bhunia
Subhadeep Koley
Amandeep Kumar
Aneeshan Sain
Pinaki Nath Chowdhury
Tao Xiang
Yi-Zhe Song
52
19
0
20 Mar 2023
Multi-modal reward for visual relationships-based image captioning
Ali Abedi
Hossein Karshenas
Peyman Adibi
44
2
0
19 Mar 2023
Blind Multimodal Quality Assessment of Low-light Images
Miaohui Wang
Zhuowei Xu
Mai Xu
Weisi Lin
41
2
0
18 Mar 2023
GNNFormer: A Graph-based Framework for Cytopathology Report Generation
Yangqiaoyu Zhou
Kai-Lang Yao
Wusuo Li
MedIm
19
1
0
17 Mar 2023
Rethinking White-Box Watermarks on Deep Learning Models under Neural Structural Obfuscation
Yifan Yan
Xudong Pan
Mi Zhang
Min Yang
AAML
25
14
0
17 Mar 2023
Cross-Modal Causal Intervention for Medical Report Generation
Weixing Chen
Yang Liu
Ce Wang
Jiarui Zhu
Shen Zhao
Guanbin Li
Cheng-Lin Liu
Liang Lin
34
6
0
16 Mar 2023
PR-MCS: Perturbation Robust Metric for MultiLingual Image Captioning
Yongil Kim
Yerin Hwang
Hyeongu Yun
Seunghyun Yoon
Trung Bui
Kyomin Jung
27
6
0
15 Mar 2023
ViperGPT: Visual Inference via Python Execution for Reasoning
Dídac Surís
Sachit Menon
Carl Vondrick
MLLM
LRM
ReLM
45
435
0
14 Mar 2023
Interventional Bag Multi-Instance Learning On Whole-Slide Pathological Images
Tiancheng Lin
Zhimiao Yu
Hongyu Hu
Yi Xu
Changyi Chen
57
80
0
13 Mar 2023
Focus on Change: Mood Prediction by Learning Emotion Changes via Spatio-Temporal Attention
S. Narayana
Subramanian Ramanathan
Ibrahim Radwan
Roland Göcke
22
2
0
12 Mar 2023
ZeroNLG: Aligning and Autoencoding Domains for Zero-Shot Multimodal and Multilingual Natural Language Generation
Bang-ju Yang
Fenglin Liu
Yuexian Zou
Xian Wu
Yaowei Wang
David A. Clifton
31
9
0
11 Mar 2023
Learning Combinatorial Prompts for Universal Controllable Image Captioning
Zhen Wang
Jun Xiao
Yueting Zhuang
Fei Gao
Jian Shao
Long Chen
60
5
0
11 Mar 2023
Previous
1
2
3
...
8
9
10
...
69
70
71
Next