Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering

17 November 2015

Papers citing "Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering"

50 / 131 papers shown

Title
OverLoCK: An Overview-first-Look-Closely-next ConvNet with Context-Mixing Dynamic Kernels Meng Lou Yizhou Yu 118 1 0 27 Feb 2025
ViTextVQA: A Large-Scale Visual Question Answering Dataset for Evaluating Vietnamese Text Comprehension in Images Quan Van Nguyen Dan Quang Tran Huy Quang Pham Thang Kien-Bao Nguyen Nghia Hieu Nguyen Kiet Van Nguyen Ngan Luu-Thuy Nguyen CoGe 39 3 0 16 Apr 2024
DIME-FM: DIstilling Multimodal and Efficient Foundation Models Ximeng Sun Pengchuan Zhang Peizhao Zhang Hardik Shah Kate Saenko Xide Xia VLM 25 20 0 31 Mar 2023
Top-Down Visual Attention from Analysis by Synthesis Baifeng Shi Trevor Darrell Xin Eric Wang 25 29 0 23 Mar 2023
Understanding Social Media Cross-Modality Discourse in Linguistic Space Chunpu Xu Hanzhuo Tan Jing Li Piji Li 24 6 0 26 Feb 2023
Interpretable Medical Image Visual Question Answering via Multi-Modal Relationship Graph Learning Xinyue Hu Lin Gu Kazuma Kobayashi Qi A. An Qingyu Chen Zhiyong Lu Chang Su Tatsuya Harada Yingying Zhu GNN 34 9 0 19 Feb 2023
On the Explainability of Natural Language Processing Deep Models Julia El Zini M. Awad 29 82 0 13 Oct 2022
RF Fingerprinting Needs Attention: Multi-task Approach for Real-World WiFi and Bluetooth Anu Jagannath Zackary Kane Jithin Jagannath 33 11 0 07 Sep 2022
From Pixels to Objects: Cubic Visual Attention for Visual Question Answering Jingkuan Song Pengpeng Zeng Lianli Gao Heng Tao Shen 32 62 0 04 Jun 2022
Structured Two-stream Attention Network for Video Question Answering Lianli Gao Pengpeng Zeng Jingkuan Song Yuan-Fang Li Wu Liu Tao Mei Heng Tao Shen 43 68 0 02 Jun 2022
Multimodal Conversational AI: A Survey of Datasets and Approaches Anirudh S. Sundar Larry Heck 38 29 0 13 May 2022
Learning to Answer Visual Questions from Web Videos Antoine Yang Antoine Miech Josef Sivic Ivan Laptev Cordelia Schmid ViT 37 33 0 10 May 2022
Attention Mechanism in Neural Networks: Where it Comes and Where it Goes Derya Soydaner 3DV 44 150 0 27 Apr 2022
CLEVR-X: A Visual Reasoning Dataset for Natural Language Explanations Leonard Salewski A. Sophia Koepke Hendrik P. A. Lensch Zeynep Akata LRM NAI 33 20 0 05 Apr 2022
CNN Attention Guidance for Improved Orthopedics Radiographic Fracture Classification Zhibin Liao Kewen Liao Haifeng Shen M. F. van Boxel J. Prijs R. Jaarsma J. Doornberg Anton Van Den Hengel Johan W. Verjans 25 14 0 21 Mar 2022
Dynamic Spatial Propagation Network for Depth Completion Y. Lin T. Cheng Qianglong Zhong Wending Zhou Huanhuan Yang 50 115 0 20 Feb 2022
Neural Attention for Image Captioning: Review of Outstanding Methods Zanyar Zohourianshahzadi Jugal Kalita VLM 35 45 0 29 Nov 2021
Auto-Encoding Knowledge Graph for Unsupervised Medical Report Generation Fenglin Liu Chenyu You Xian Wu Shen Ge Sheng Wang Xu Sun MedIm 81 92 0 08 Nov 2021
Understanding the computational demands underlying visual reasoning Mohit Vaishnav Rémi Cadène A. Alamia Drew Linsley Rufin VanRullen Thomas Serre GNN CoGe 40 16 0 08 Aug 2021
Neural Twins Talk & Alternative Calculations Zanyar Zohourianshahzadi Jugal Kalita 19 0 0 05 Aug 2021
Measuring and Improving BERT's Mathematical Abilities by Predicting the Order of Reasoning Piotr Pikekos Henryk Michalewski Mateusz Malinowski 30 28 0 07 Jun 2021
Visual Navigation with Spatial Attention Bar Mayo Tamir Hazan A. Tal EgoV 27 73 0 20 Apr 2021
Answer Questions with Right Image Regions: A Visual Attention Regularization Approach Y. Liu Yangyang Guo Jianhua Yin Xuemeng Song Weifeng Liu Liqiang Nie 29 28 0 03 Feb 2021
Explainability of deep vision-based autonomous driving systems: Review and challenges Éloi Zablocki H. Ben-younes P. Pérez Matthieu Cord XAI 48 170 0 13 Jan 2021
OpenViDial: A Large-Scale, Open-Domain Dialogue Dataset with Visual Contexts Yuxian Meng Shuhe Wang Qinghong Han Xiaofei Sun Fei Wu Rui Yan Jiwei Li 27 28 0 30 Dec 2020
Knowledge-Routed Visual Question Reasoning: Challenges for Deep Representation Embedding Qingxing Cao Bailin Li Xiaodan Liang Keze Wang Liang Lin 44 36 0 14 Dec 2020
Cross-Media Keyphrase Prediction: A Unified Framework with Multi-Modality Multi-Head Attention and Image Wordings Yue Wang Jing Li M. Lyu Irwin King 16 16 0 03 Nov 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering Aisha Urooj Khan Amir Mazaheri N. Lobo M. Shah 32 56 0 27 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review Wei Chen Weiping Wang Li Liu M. Lew VLM 118 31 0 16 Oct 2020
Spatial Attention as an Interface for Image Captioning Models P. Sadler 28 0 0 29 Sep 2020
The Impact of Explanations on AI Competency Prediction in VQA Kamran Alipour Arijit Ray Xiaoyu Lin J. Schulze Yi Yao Giedrius Burachas 27 9 0 02 Jul 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA Hyounghun Kim Zineng Tang Joey Tianyi Zhou 33 31 0 13 May 2020
Modeling Human Visual Search Performance on Realistic Webpages Using Analytical and Deep Learning Methods Arianna Yuan Yong Li HAI 25 24 0 07 May 2020
Sequential Interpretability: Methods, Applications, and Future Direction for Understanding Deep Learning Models in the Context of Sequential Data B. Shickel Parisa Rashidi AI4TS 27 17 0 27 Apr 2020
Causal Interpretability for Machine Learning -- Problems, Methods and Evaluation Raha Moraffah Mansooreh Karami Ruocheng Guo A. Raglin Huan Liu CML ELM XAI 27 213 0 09 Mar 2020
Deep Multi-Modal Sets A. Reiter Menglin Jia Pu Yang Ser-Nam Lim BDL 25 4 0 03 Mar 2020
Natural Language Processing Advancements By Deep Learning: A Survey A. Torfi Rouzbeh A. Shirvani Yaser Keneshloo Nader Tavvaf Edward A. Fox AI4CE VLM 83 216 0 02 Mar 2020
A Study on Multimodal and Interactive Explanations for Visual Question Answering Kamran Alipour J. Schulze Yi Yao Avi Ziskind Giedrius Burachas 32 27 0 01 Mar 2020
Robust Explanations for Visual Question Answering Badri N. Patro Shivansh Pate Vinay P. Namboodiri OOD AAML 25 20 0 23 Jan 2020
Explanation vs Attention: A Two-Player Game to Obtain Attention for VQA Badri N. Patro Anupriy Vinay P. Namboodiri AAML FAtt 48 26 0 19 Nov 2019
DualVD: An Adaptive Dual Encoding Model for Deep Visual Understanding in Visual Dialogue X. Jiang Jiahao Yu Zengchang Qin Yingying Zhuang Xingxing Zhang Yue Hu Qi Wu 23 70 0 17 Nov 2019
TAB-VCR: Tags and Attributes based Visual Commonsense Reasoning Baselines Jingxiang Lin Unnat Jain A. Schwing LRM ReLM 34 9 0 31 Oct 2019
Cross Attention Network for Few-shot Classification Rui Hou Hong Chang Bingpeng Ma Shiguang Shan Xilin Chen 204 631 0 17 Oct 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval Reuben Tan Huijuan Xu Kate Saenko Bryan A. Plummer 28 67 0 27 Sep 2019
Compact Trilinear Interaction for Visual Question Answering Tuong Khanh Long Do Thanh-Toan Do Huy Tran Erman Tjiputra Quang-Dieu Tran 36 59 0 26 Sep 2019
Dynamic Graph Attention for Referring Expression Comprehension Sibei Yang Guanbin Li Yizhou Yu OCL 25 215 0 18 Sep 2019
CAMP: Cross-Modal Adaptive Message Passing for Text-Image Retrieval Zihao Wang Xihui Liu Hongsheng Li Lu Sheng Junjie Yan Xiaogang Wang Jing Shao VLM 25 299 0 12 Sep 2019
Probabilistic framework for solving Visual Dialog Badri N. Patro Anupriy Vinay P. Namboodiri BDL 30 13 0 11 Sep 2019
A Better Way to Attend: Attention with Trees for Video Question Answering Hongyang Xue Wenqing Chu Zhou Zhao Deng Cai 25 33 0 05 Sep 2019
U-CAM: Visual Explanation using Uncertainty based Class Activation Maps Badri N. Patro Mayank Lunayach Shivansh Patel Vinay P. Namboodiri FAtt UQCV 27 76 0 17 Aug 2019