v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015

Devi Parikh

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown

Title
Loss re-scaling VQA: Revisiting the LanguagePrior Problem from a Class-imbalance View Yangyang Guo Liqiang Nie Zhiyong Cheng Q. Tian Min Zhang 116 70 0 30 Oct 2020
Leveraging Visual Question Answering to Improve Text-to-Image Synthesis Stanislav Frolov Shailza Jolly Jörn Hees Andreas Dengel EGVM 50 5 0 28 Oct 2020
SIRI: Spatial Relation Induced Network For Spatial Description Resolution Peiyao Wang Weixin Luo Yanyu Xu Haojie Li Shugong Xu Jianyu Yang Shenghua Gao 36 0 0 27 Oct 2020
MMFT-BERT: Multimodal Fusion Transformer with BERT Encodings for Visual Question Answering Aisha Urooj Khan Amir Mazaheri N. Lobo M. Shah 97 57 0 27 Oct 2020
Reading Between the Lines: Exploring Infilling in Visual Narratives Khyathi Chandu Ruo-Ping Dong A. Black 29 4 0 26 Oct 2020
RUArt: A Novel Text-Centered Solution for Text-Based Visual Question Answering Zanxia Jin Heran Wu Chun Yang Fang Zhou Jingyan Qin Lei Xiao Xu-Cheng Yin 88 31 0 24 Oct 2020
Beyond VQA: Generating Multi-word Answer and Rationale to Visual Questions Radhika Dua Sai Srinivas Kancheti V. Balasubramanian LRM 88 22 0 24 Oct 2020
ST-BERT: Cross-modal Language Model Pre-training For End-to-end Spoken Language Understanding Minjeong Kim Gyuwan Kim Sang-Woo Lee Jung-Woo Ha VLM 76 36 0 23 Oct 2020
Language-Conditioned Imitation Learning for Robot Manipulation Tasks Simon Stepputtis Joseph Campbell Mariano Phielipp Stefan Lee Chitta Baral H. B. Amor LM&Ro 200 205 0 22 Oct 2020
Deep Reinforcement Learning with Stacked Hierarchical Attention for Text-based Games Yunqiu Xu Meng Fang Ling-Hao Chen Yali Du Qiufeng Wang Chengqi Zhang OffRL 84 44 0 22 Oct 2020
Learning Dual Semantic Relations with Graph Attention for Image-Text Matching Keyu Wen Xiaodong Gu Qingrong Cheng 76 97 0 22 Oct 2020
Literature Review of Computer Tools for the Visually Impaired: a focus on Search Engines Guy Meyer Alan Wassyng M. Lawford Kourosh Sabri S. Shirani 13 2 0 21 Oct 2020
Removing Bias in Multi-modal Classifiers: Regularization by Maximizing Functional Entropies Itai Gat Idan Schwartz Alex Schwing Tamir Hazan 106 92 0 21 Oct 2020
BiST: Bi-directional Spatio-Temporal Reasoning for Video-Grounded Dialogues Hung Le Doyen Sahoo Nancy F. Chen Guosheng Lin 117 31 0 20 Oct 2020
SOrT-ing VQA Models : Contrastive Gradient Learning for Improved Consistency Sameer Dharur Purva Tendulkar Dhruv Batra Devi Parikh Ramprasaath R. Selvaraju 56 2 0 20 Oct 2020
The Open Catalyst 2020 (OC20) Dataset and Community Challenges L. Chanussot Abhishek Das Siddharth Goyal Thibaut Lavril Muhammed Shuaibi ... Brandon M. Wood Junwoong Yoon Devi Parikh C. L. Zitnick Zachary W. Ulissi 331 540 0 20 Oct 2020
Multimodal Research in Vision and Language: A Review of Current and Emerging Trends Shagun Uppal Sarthak Bhagat Devamanyu Hazarika Navonil Majumdar Soujanya Poria Roger Zimmermann Amir Zadeh 101 6 0 19 Oct 2020
Knowledge Graph-based Question Answering with Electronic Health Records Junwoo Park Youngwoo Cho Haneol Lee Jaegul Choo Edward Choi 95 35 0 19 Oct 2020
Querent Intent in Multi-Sentence Questions Laurie Burchell Ji-Eun Chi Tom Hosking Nina Markl Bonnie Webber 42 3 0 18 Oct 2020
Multimodal Speech Recognition with Unstructured Audio Masking Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott CVBM 48 10 0 16 Oct 2020
What Can You Learn from Your Muscles? Learning Visual Representation from Human Interactions Kiana Ehsani Daniel Gordon T. Nguyen Roozbeh Mottaghi Ali Farhadi 3DH SSL 64 2 0 16 Oct 2020
New Ideas and Trends in Deep Multimodal Content Understanding: A Review Wei Chen Weiping Wang Li Liu M. Lew VLM 174 33 0 16 Oct 2020
What is More Likely to Happen Next? Video-and-Language Future Event Prediction Jie Lei Licheng Yu Tamara L. Berg Joey Tianyi Zhou 101 73 0 15 Oct 2020
Geometry matters: Exploring language examples at the decision boundary Debajyoti Datta Shashwat Kumar Laura E. Barnes Tom Fletcher AAML 45 3 0 14 Oct 2020
Neural Databases James Thorne Majid Yazdani Marzieh Saeidi Fabrizio Silvestri Sebastian Riedel A. Halevy NAI 99 9 0 14 Oct 2020
Vokenization: Improving Language Understanding with Contextualized, Visual-Grounded Supervision Hao Tan Joey Tianyi Zhou CLIP 89 121 0 14 Oct 2020
Contrast and Classify: Training Robust VQA Models Yash Kant A. Moudgil Dhruv Batra Devi Parikh Harsh Agrawal 55 5 0 13 Oct 2020
TSPNet: Hierarchical Feature Learning via Temporal Semantic Pyramid for Sign Language Translation Dongxu Li Chenchen Xu Xin Yu Kaihao Zhang Ben Swift H. Suominen Hongdong Li SLR 60 124 0 12 Oct 2020
Interpretable Neural Computation for Real-World Compositional Visual Question Answering Ruixue Tang Chao Ma CoGe 26 2 0 10 Oct 2020
ALFWorld: Aligning Text and Embodied Environments for Interactive Learning Mohit Shridhar Xingdi Yuan Marc-Alexandre Côté Yonatan Bisk Adam Trischler Matthew J. Hausknecht LM&Ro LLMAG 156 450 0 08 Oct 2020
Multi-label classification of promotions in digital leaflets using textual and visual information R. Arroyo David Jiménez-Cabello Javier Martínez-Cebrián 59 3 0 07 Oct 2020
Vision Skills Needed to Answer Visual Questions Xiaoyu Zeng Yanan Wang Tai-Yin Chiu Nilavra Bhattacharya Danna Gurari 66 18 0 07 Oct 2020
Learning to Represent Image and Text with Denotation Graph Bowen Zhang Hexiang Hu Vihan Jain Eugene Ie Fei Sha 78 22 0 06 Oct 2020
Pathological Visual Question Answering Xuehai He Zhuo Cai Wenlan Wei Yichen Zhang Luntian Mou Eric Xing P. Xie 140 24 0 06 Oct 2020
Fine-Grained Grounding for Multimodal Speech Recognition Tejas Srinivasan Ramon Sanabria Florian Metze Desmond Elliott 76 11 0 05 Oct 2020
Attention Guided Semantic Relationship Parsing for Visual Question Answering M. Farazi Salman Khan Nick Barnes 43 2 0 05 Oct 2020
Multi-Modal Open-Domain Dialogue Kurt Shuster Eric Michael Smith Da Ju Jason Weston AI4CE 141 44 0 02 Oct 2020
CAPTION: Correction by Analyses, POS-Tagging and Interpretation of Objects using only Nouns L. Ferreira Douglas De Rizzo Meneghetti P. Santos 26 2 0 02 Oct 2020
ISAAQ -- Mastering Textbook Questions with Pre-trained Transformers and Bottom-Up and Top-Down Attention José Manuél Gómez-Pérez Raúl Ortega 61 24 0 01 Oct 2020
Graph-based Heuristic Search for Module Selection Procedure in Neural Module Network Yuxuan Wu Hideki Nakayama GNN 39 3 0 30 Sep 2020
Trustworthy Convolutional Neural Networks: A Gradient Penalized-based Approach Nicholas F Halliwell Freddy Lecue FAtt 118 9 0 29 Sep 2020
Spatial Attention as an Interface for Image Captioning Models P. Sadler 53 0 0 29 Sep 2020
Where is the Model Looking At?--Concentrate and Explain the Network Attention Wenjia Xu Jiuniu Wang Yang Wang Guangluan Xu Wei Dai Yirong Wu XAI 90 17 0 29 Sep 2020
Hierarchical Deep Multi-modal Network for Medical Visual Question Answering D. Gupta S. Suman Asif Ekbal 59 61 0 27 Sep 2020
Multiple interaction learning with question-type prior knowledge for constraining answer search space in visual question answering Tuong Khanh Long Do Binh X. Nguyen Huy Tran Erman Tjiputra Quang-Dieu Tran Thanh-Toan Do 40 2 0 23 Sep 2020
Dataset Cartography: Mapping and Diagnosing Datasets with Training Dynamics Swabha Swayamdipta Roy Schwartz Nicholas Lourie Yizhong Wang Hannaneh Hajishirzi Noah A. Smith Yejin Choi 150 452 0 22 Sep 2020
Regularizing Attention Networks for Anomaly Detection in Visual Question Answering Doyup Lee Yeongjae Cheon Wook-Shin Han AAML OOD 44 16 0 21 Sep 2020
Commands 4 Autonomous Vehicles (C4AV) Workshop Summary Thierry Deruyttere Simon Vandenhende Dusan Grujicic Yu Liu Luc Van Gool Matthew Blaschko Tinne Tuytelaars Marie-Francine Moens 68 6 0 18 Sep 2020
MUTANT: A Training Paradigm for Out-of-Distribution Generalization in Visual Question Answering Tejas Gokhale Pratyay Banerjee Chitta Baral Yezhou Yang OOD 62 142 0 18 Sep 2020
A Multimodal Memes Classification: A Survey and Open Research Issues Tariq Habib Afridi A. Alam Muhammad Numan Khan Jawad Khan Young-Koo Lee 55 41 0 17 Sep 2020