ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1511.05234
  4. Cited By
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for
  Visual Question Answering

Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

17 November 2015
Huijuan Xu
Kate Saenko
ArXivPDFHTML

Papers citing "Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering"

50 / 131 papers shown
Title
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels
Meng Lou
Yizhou Yu
118
1
0
27 Feb 2025
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images
Quan Van Nguyen
Dan Quang Tran
Huy Quang Pham
Thang Kien-Bao Nguyen
Nghia Hieu Nguyen
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
CoGe
39
3
0
16 Apr 2024
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
DIME-FM: DIstilling Multimodal and Efficient Foundation Models
Ximeng Sun
Pengchuan Zhang
Peizhao Zhang
Hardik Shah
Kate Saenko
Xide Xia
VLM
25
20
0
31 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis
Top-Down Visual Attention from Analysis by Synthesis
Baifeng Shi
Trevor Darrell
Xin Eric Wang
25
29
0
23 Mar 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Understanding Social Media Cross-Modality Discourse in Linguistic Space
Chunpu Xu
Hanzhuo Tan
Jing Li
Piji Li
24
6
0
26 Feb 2023
Interpretable Medical Image Visual Question Answering via Multi-Modal
  Relationship Graph Learning
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning
Xinyue Hu
Lin Gu
Kazuma Kobayashi
Qi A. An
Qingyu Chen
Zhiyong Lu
Chang Su
Tatsuya Harada
Yingying Zhu
GNN
34
9
0
19 Feb 2023
On the Explainability of Natural Language Processing Deep Models
On the Explainability of Natural Language Processing Deep Models
Julia El Zini
M. Awad
29
82
0
13 Oct 2022
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World
  WiFi and Bluetooth
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth
Anu Jagannath
Zackary Kane
Jithin Jagannath
33
11
0
07 Sep 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question
  Answering
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering
Jingkuan Song
Pengpeng Zeng
Lianli Gao
Heng Tao Shen
32
62
0
04 Jun 2022
Structured Two-stream Attention Network for Video Question Answering
Structured Two-stream Attention Network for Video Question Answering
Lianli Gao
Pengpeng Zeng
Jingkuan Song
Yuan-Fang Li
Wu Liu
Tao Mei
Heng Tao Shen
43
68
0
02 Jun 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches
Multimodal Conversational AI: A Survey of Datasets and Approaches
Anirudh S. Sundar
Larry Heck
38
29
0
13 May 2022
Learning to Answer Visual Questions from Web Videos
Learning to Answer Visual Questions from Web Videos
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
37
33
0
10 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes
Derya Soydaner
3DV
44
150
0
27 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations
Leonard Salewski
A. Sophia Koepke
Hendrik P. A. Lensch
Zeynep Akata
LRM
NAI
33
20
0
05 Apr 2022
CNN Attention Guidance for Improved Orthopedics Radiographic Fracture
  Classification
CNN Attention Guidance for Improved Orthopedics Radiographic Fracture Classification
Zhibin Liao
Kewen Liao
Haifeng Shen
M. F. van Boxel
J. Prijs
R. Jaarsma
J. Doornberg
Anton Van Den Hengel
Johan W. Verjans
25
14
0
21 Mar 2022
Dynamic Spatial Propagation Network for Depth Completion
Dynamic Spatial Propagation Network for Depth Completion
Y. Lin
T. Cheng
Qianglong Zhong
Wending Zhou
Huanhuan Yang
50
115
0
20 Feb 2022
Neural Attention for Image Captioning: Review of Outstanding Methods
Neural Attention for Image Captioning: Review of Outstanding Methods
Zanyar Zohourianshahzadi
Jugal Kalita
VLM
35
45
0
29 Nov 2021
Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation
Fenglin Liu
Chenyu You
Xian Wu
Shen Ge
Sheng Wang
Xu Sun
MedIm
81
92
0
08 Nov 2021
Understanding the computational demands underlying visual reasoning
Understanding the computational demands underlying visual reasoning
Mohit Vaishnav
Rémi Cadène
A. Alamia
Drew Linsley
Rufin VanRullen
Thomas Serre
GNN
CoGe
40
16
0
08 Aug 2021
Neural Twins Talk & Alternative Calculations
Neural Twins Talk & Alternative Calculations
Zanyar Zohourianshahzadi
Jugal Kalita
19
0
0
05 Aug 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the
  Order of Reasoning
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning
Piotr Pikekos
Henryk Michalewski
Mateusz Malinowski
30
28
0
07 Jun 2021
Visual Navigation with Spatial Attention
Visual Navigation with Spatial Attention
Bar Mayo
Tamir Hazan
A. Tal
EgoV
27
73
0
20 Apr 2021
Answer Questions with Right Image Regions: A Visual Attention
  Regularization Approach
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach
Y. Liu
Yangyang Guo
Jianhua Yin
Xuemeng Song
Weifeng Liu
Liqiang Nie
29
28
0
03 Feb 2021
Explainability of deep vision-based autonomous driving systems: Review
  and challenges
Explainability of deep vision-based autonomous driving systems: Review and challenges
Éloi Zablocki
H. Ben-younes
P. Pérez
Matthieu Cord
XAI
48
170
0
13 Jan 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual
  Contexts
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts
Yuxian Meng
Shuhe Wang
Qinghong Han
Xiaofei Sun
Fei Wu
Rui Yan
Jiwei Li
27
28
0
30 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep
  Representation Embedding
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding
Qingxing Cao
Bailin Li
Xiaodan Liang
Keze Wang
Liang Lin
44
36
0
14 Dec 2020
Cross-Media Keyphrase Prediction: A Unified Framework with
  Multi-Modality Multi-Head Attention and Image Wordings
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings
Yue Wang
Jing Li
M. Lyu
Irwin King
16
16
0
03 Nov 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual
  Question Answering
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering
Aisha Urooj Khan
Amir Mazaheri
N. Lobo
M. Shah
32
56
0
27 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
New Ideas and Trends in Deep Multimodal Content Understanding: A Review
Wei Chen
Weiping Wang
Li Liu
M. Lew
VLM
118
31
0
16 Oct 2020
Spatial Attention as an Interface for Image Captioning Models
Spatial Attention as an Interface for Image Captioning Models
P. Sadler
28
0
0
29 Sep 2020
The Impact of Explanations on AI Competency Prediction in VQA
The Impact of Explanations on AI Competency Prediction in VQA
Kamran Alipour
Arijit Ray
Xiaoyu Lin
J. Schulze
Yi Yao
Giedrius Burachas
27
9
0
02 Jul 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal
  Localization in VideoQA
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA
Hyounghun Kim
Zineng Tang
Joey Tianyi Zhou
33
31
0
13 May 2020
Modeling Human Visual Search Performance on Realistic Webpages Using
  Analytical and Deep Learning Methods
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods
Arianna Yuan
Yong Li
HAI
25
24
0
07 May 2020
Sequential Interpretability: Methods, Applications, and Future Direction
  for Understanding Deep Learning Models in the Context of Sequential Data
Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data
B. Shickel
Parisa Rashidi
AI4TS
27
17
0
27 Apr 2020
Causal Interpretability for Machine Learning -- Problems, Methods and
  Evaluation
Causal Interpretability for Machine Learning -- Problems, Methods and Evaluation
Raha Moraffah
Mansooreh Karami
Ruocheng Guo
A. Raglin
Huan Liu
CML
ELM
XAI
27
213
0
09 Mar 2020
Deep Multi-Modal Sets
Deep Multi-Modal Sets
A. Reiter
Menglin Jia
Pu Yang
Ser-Nam Lim
BDL
25
4
0
03 Mar 2020
Natural Language Processing Advancements By Deep Learning: A Survey
Natural Language Processing Advancements By Deep Learning: A Survey
A. Torfi
Rouzbeh A. Shirvani
Yaser Keneshloo
Nader Tavvaf
Edward A. Fox
AI4CE
VLM
83
216
0
02 Mar 2020
A Study on Multimodal and Interactive Explanations for Visual Question
  Answering
A Study on Multimodal and Interactive Explanations for Visual Question Answering
Kamran Alipour
J. Schulze
Yi Yao
Avi Ziskind
Giedrius Burachas
32
27
0
01 Mar 2020
Robust Explanations for Visual Question Answering
Robust Explanations for Visual Question Answering
Badri N. Patro
Shivansh Pate
Vinay P. Namboodiri
OOD
AAML
25
20
0
23 Jan 2020
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA
Badri N. Patro
Anupriy
Vinay P. Namboodiri
AAML
FAtt
48
26
0
19 Nov 2019
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in
  Visual Dialogue
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue
X. Jiang
Jiahao Yu
Zengchang Qin
Yingying Zhuang
Xingxing Zhang
Yue Hu
Qi Wu
23
70
0
17 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning
  Baselines
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines
Jingxiang Lin
Unnat Jain
A. Schwing
LRM
ReLM
34
9
0
31 Oct 2019
Cross Attention Network for Few-shot Classification
Cross Attention Network for Few-shot Classification
Rui Hou
Hong Chang
Bingpeng Ma
Shiguang Shan
Xilin Chen
204
631
0
17 Oct 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video
  Moment Retrieval
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval
Reuben Tan
Huijuan Xu
Kate Saenko
Bryan A. Plummer
28
67
0
27 Sep 2019
Compact Trilinear Interaction for Visual Question Answering
Compact Trilinear Interaction for Visual Question Answering
Tuong Khanh Long Do
Thanh-Toan Do
Huy Tran
Erman Tjiputra
Quang-Dieu Tran
36
59
0
26 Sep 2019
Dynamic Graph Attention for Referring Expression Comprehension
Dynamic Graph Attention for Referring Expression Comprehension
Sibei Yang
Guanbin Li
Yizhou Yu
OCL
25
215
0
18 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval
Zihao Wang
Xihui Liu
Hongsheng Li
Lu Sheng
Junjie Yan
Xiaogang Wang
Jing Shao
VLM
25
299
0
12 Sep 2019
Probabilistic framework for solving Visual Dialog
Probabilistic framework for solving Visual Dialog
Badri N. Patro
Anupriy
Vinay P. Namboodiri
BDL
30
13
0
11 Sep 2019
A Better Way to Attend: Attention with Trees for Video Question
  Answering
A Better Way to Attend: Attention with Trees for Video Question Answering
Hongyang Xue
Wenqing Chu
Zhou Zhao
Deng Cai
25
33
0
05 Sep 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps
Badri N. Patro
Mayank Lunayach
Shivansh Patel
Vinay P. Namboodiri
FAtt
UQCV
27
76
0
17 Aug 2019
123
Next