Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1511.02274
Cited By
Stacked Attention Networks for Image Question Answering
7 November 2015
Zichao Yang
Xiaodong He
Jianfeng Gao
Li Deng
Alex Smola
BDL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Stacked Attention Networks for Image Question Answering"
50 / 277 papers shown
Title
Spectral Transform Forms Scalable Transformer
Bingxin Zhou
Xinliang Liu
Yuehua Liu
Yunyin Huang
Pietro Lio
Yuguang Wang
52
6
0
15 Nov 2021
A Simple Approach to Image Tilt Correction with Self-Attention MobileNet for Smartphones
Siddhant Garg
D. Mohanty
S. Thota
Sukumar Moharana
ViT
19
2
0
31 Oct 2021
Single-Modal Entropy based Active Learning for Visual Question Answering
Dong-Jin Kim
Jae-Won Cho
Jinsoo Choi
Yunjae Jung
In So Kweon
25
12
0
21 Oct 2021
Topic Scene Graph Generation by Attention Distillation from Caption
Wenbin Wang
R. Wang
X. Chen
DiffM
25
14
0
12 Oct 2021
A Survey On Neural Word Embeddings
Erhan Sezerer
Selma Tekir
AI4TS
26
12
0
05 Oct 2021
Counterfactual Samples Synthesizing and Training for Robust Visual Question Answering
Long Chen
Yuhang Zheng
Yulei Niu
Hanwang Zhang
Jun Xiao
AAML
OOD
21
36
0
03 Oct 2021
Calibrating Concepts and Operations: Towards Symbolic Reasoning on Real Images
Zhuowan Li
Elias Stengel-Eskin
Yixiao Zhang
Cihang Xie
Q. Tran
Benjamin Van Durme
Alan Yuille
VLM
24
15
0
01 Oct 2021
VQA-MHUG: A Gaze Dataset to Study Multimodal Neural Attention in Visual Question Answering
Ekta Sood
Fabian Kögel
Florian Strohm
Prajit Dhar
Andreas Bulling
40
19
0
27 Sep 2021
How to find a good image-text embedding for remote sensing visual question answering?
Christel Chappuis
Sylvain Lobry
B. Kellenberger
Bertrand Le Saux
D. Tuia
40
20
0
24 Sep 2021
Improving Joint Learning of Chest X-Ray and Radiology Report by Word Region Alignment
Zhanghexuan Ji
Mohammad Abuzar Shaikh
Dana Moukheiber
S. Srihari
Yifan Peng
Mingchen Gao
SSL
16
20
0
04 Sep 2021
Understanding the computational demands underlying visual reasoning
Mohit Vaishnav
Rémi Cadène
A. Alamia
Drew Linsley
Rufin VanRullen
Thomas Serre
GNN
CoGe
40
16
0
08 Aug 2021
How Much Can CLIP Benefit Vision-and-Language Tasks?
Sheng Shen
Liunian Harold Li
Hao Tan
Joey Tianyi Zhou
Anna Rohrbach
Kai-Wei Chang
Z. Yao
Kurt Keutzer
CLIP
VLM
MLLM
202
405
0
13 Jul 2021
Zero-shot Visual Question Answering using Knowledge Graph
Zhuo Chen
Jiaoyan Chen
Yuxia Geng
Jeff Z. Pan
Zonggang Yuan
Huajun Chen
23
70
0
12 Jul 2021
DualVGR: A Dual-Visual Graph Reasoning Unit for Video Question Answering
Jianyu Wang
Bingkun Bao
Changsheng Xu
19
75
0
10 Jul 2021
Adventurer's Treasure Hunt: A Transparent System for Visually Grounded Compositional Visual Question Answering based on Scene Graphs
Daniel Reich
F. Putze
Tanja Schultz
30
2
0
28 Jun 2021
VOLO: Vision Outlooker for Visual Recognition
Li-xin Yuan
Qibin Hou
Zihang Jiang
Jiashi Feng
Shuicheng Yan
ViT
52
314
0
24 Jun 2021
Attend What You Need: Motion-Appearance Synergistic Networks for Video Question Answering
Ahjeong Seo
Gi-Cheon Kang
J. Park
Byoung-Tak Zhang
18
53
0
19 Jun 2021
Are VQA Systems RAD? Measuring Robustness to Augmented Data with Focused Interventions
Daniel Rosenberg
Itai Gat
Amir Feder
Roi Reichart
AAML
39
16
0
08 Jun 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
30
28
0
07 Jun 2021
Multiple Meta-model Quantifying for Medical Visual Question Answering
Tuong Khanh Long Do
Binh X. Nguyen
Erman Tjiputra
Minh-Ngoc Tran
Quang-Dieu Tran
A. Nguyen
38
98
0
19 May 2021
InfographicVQA
Minesh Mathew
Viraj Bagal
Rubèn Pérez Tito
Dimosthenis Karatzas
Ernest Valveny
C. V. Jawahar
39
206
0
26 Apr 2021
AttWalk: Attentive Cross-Walks for Deep Mesh Analysis
Ran Ben Izhak
Alon Lahav
A. Tal
3DV
29
10
0
23 Apr 2021
Visual Navigation with Spatial Attention
Bar Mayo
Tamir Hazan
A. Tal
EgoV
27
73
0
20 Apr 2021
Beyond Question-Based Biases: Assessing Multimodal Shortcut Learning in Visual Question Answering
Corentin Dancette
Rémi Cadène
Damien Teney
Matthieu Cord
CML
28
76
0
07 Apr 2021
Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning
Zhicheng Huang
Zhaoyang Zeng
Yupan Huang
Bei Liu
Dongmei Fu
Jianlong Fu
VLM
ViT
51
271
0
07 Apr 2021
Differentiable Patch Selection for Image Recognition
Jean-Baptiste Cordonnier
Aravindh Mahendran
Alexey Dosovitskiy
Dirk Weissenborn
Jakob Uszkoreit
Thomas Unterthiner
33
93
0
07 Apr 2021
Dual Contrastive Loss and Attention for GANs
Ning Yu
Guilin Liu
Aysegül Dündar
Andrew Tao
Bryan Catanzaro
Larry S. Davis
Mario Fritz
GAN
34
60
0
31 Mar 2021
Incorporating Convolution Designs into Visual Transformers
Kun Yuan
Shaopeng Guo
Ziwei Liu
Aojun Zhou
F. Yu
Wei Wu
ViT
56
467
0
22 Mar 2021
Local Interpretations for Explainable Natural Language Processing: A Survey
Siwen Luo
Hamish Ivison
S. Han
Josiah Poon
MILM
38
48
0
20 Mar 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding
Qi Feng
Yunchao Wei
Mingming Cheng
Yi Yang
27
5
0
18 Mar 2021
Causal Attention for Vision-Language Tasks
Xu Yang
Hanwang Zhang
Guojun Qi
Jianfei Cai
CML
28
148
0
05 Mar 2021
SLAKE: A Semantically-Labeled Knowledge-Enhanced Dataset for Medical Visual Question Answering
Bo Liu
Li-Ming Zhan
Li Xu
Lin Ma
Y. Yang
Xiao-Ming Wu
31
236
0
18 Feb 2021
Biomedical Question Answering: A Survey of Approaches and Challenges
Qiao Jin
Zheng Yuan
Guangzhi Xiong
Qian Yu
Huaiyuan Ying
Chuanqi Tan
Mosha Chen
Songfang Huang
Xiaozhong Liu
Sheng Yu
26
95
0
10 Feb 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
29
28
0
03 Feb 2021
Latent Variable Models for Visual Question Answering
Zixu Wang
Yishu Miao
Lucia Specia
25
5
0
16 Jan 2021
Explainability of deep vision-based autonomous driving systems: Review and challenges
Éloi Zablocki
H. Ben-younes
P. Pérez
Matthieu Cord
XAI
48
170
0
13 Jan 2021
ORDNet: Capturing Omni-Range Dependencies for Scene Parsing
Shaofei Huang
Si Liu
Tianrui Hui
Jizhong Han
Bo-wen Li
Jiashi Feng
Shuicheng Yan
3DPC
OffRL
37
15
0
11 Jan 2021
MELINDA: A Multimodal Dataset for Biomedical Experiment Method Classification
Te-Lin Wu
Shikhar Singh
S. Paul
Gully A. Burns
Nanyun Peng
30
18
0
16 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering
Pratyay Banerjee
Tejas Gokhale
Yezhou Yang
Chitta Baral
25
35
0
04 Dec 2020
ATSal: An Attention Based Architecture for Saliency Prediction in 360 Videos
Y. A. D. Djilali
Marouane Tliba
Kevin McGuinness
Noel E. O'Connor
40
42
0
20 Nov 2020
An Improved Attention for Visual Question Answering
Tanzila Rahman
Shih-Han Chou
Leonid Sigal
Giuseppe Carenini
13
42
0
04 Nov 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
M. Lyu
Irwin King
13
16
0
03 Nov 2020
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games
Yunqiu Xu
Meng Fang
Ling-Hao Chen
Yali Du
Qiufeng Wang
Chengqi Zhang
OffRL
25
44
0
22 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
118
31
0
16 Oct 2020
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
28
0
0
29 Sep 2020
Where is the Model Looking At?--Concentrate and Explain the Network Attention
Wenjia Xu
Jiuniu Wang
Yang Wang
Guangluan Xu
Wei Dai
Yirong Wu
XAI
29
17
0
29 Sep 2020
A Comparison of Pre-trained Vision-and-Language Models for Multimodal Representation Learning across Medical Images and Reports
Yikuan Li
Hanyin Wang
Yuan Luo
19
63
0
03 Sep 2020
Counting from Sky: A Large-scale Dataset for Remote Sensing Object Counting and A Benchmark Method
Guangshuai Gao
Qingjie Liu
Yunhong Wang
13
14
0
28 Aug 2020
AiR: Attention with Reasoning Capability
Shi Chen
Ming Jiang
Jinhui Yang
Qi Zhao
LRM
13
36
0
28 Jul 2020
Previous
1
2
3
4
5
6
Next