v1v2v3v4v5v6v7 (latest)

VQA: Visual Question Answering

3 May 2015

Devi Parikh

Papers citing "VQA: Visual Question Answering"

50 / 2,957 papers shown

Title
Tensor Fusion Network for Multimodal Sentiment Analysis Amir Zadeh Minghai Chen Soujanya Poria Min Zhang Louis-Philippe Morency 92 1,238 0 23 Jul 2017
Inspiring Computer Vision System Solutions J. Zilly A. Boyarski Micael Carvalho Amir Atapour-Abarghouei Konstantinos Amplianitis ... Massimiliano Mancini Hernán Gonzalez Riccardo Spezialetti Carlos Sampedro Pérez Hao Li 18 1 0 22 Jul 2017
Video Question Answering via Attribute-Augmented Attention Network Learning Yunan Ye Zhou Zhao Yimeng Li Long Chen Jun Xiao Yueting Zhuang 80 109 0 20 Jul 2017
Visual Question Answering with Memory-Augmented Networks Chao Ma Chunhua Shen A. Dick Qi Wu Peng Wang Anton Van Den Hengel Ian Reid 90 100 0 17 Jul 2017
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach Aidean Sharghi Jacob S. Laurel Boqing Gong EgoV 122 137 0 16 Jul 2017
Automatic Understanding of Image and Video Advertisements Zaeem Hussain Ruotong Wang Xiaozhong Zhang Keren Ye Christopher Thomas Zuha Agha Nathan Ong Adriana Kovashka DiffM 69 166 0 10 Jul 2017
Learning Visual Reasoning Without Strong Priors Ethan Perez H. D. Vries Florian Strub Vincent Dumoulin Aaron Courville OOD NAI 108 62 0 10 Jul 2017
DeepStory: Video Story QA by Deep Embedded Memory Networks Kyung-Min Kim Min-Oh Heo Seongho Choi Byoung-Tak Zhang 97 175 0 04 Jul 2017
Modulating early visual processing by language H. D. Vries Florian Strub Jérémie Mary Hugo Larochelle Olivier Pietquin Aaron Courville 192 489 0 02 Jul 2017
Compact Tensor Pooling for Visual Question Answering Yang Shi Tommaso Furlanello Anima Anandkumar 16 0 0 20 Jun 2017
Identifying Spatial Relations in Images using Convolutional Neural Networks Mandar Haldekar Ashwinkumar Ganesan Tim Oates 44 39 0 13 Jun 2017
Learning to Extract Semantic Structure from Documents Using Multimodal Fully Convolutional Neural Network Xiao Yang Ersin Yumer P. Asente Mike Kraley Daniel Kifer C. Lee Giles 78 230 0 07 Jun 2017
Best of Both Worlds: Transferring Knowledge from Discriminative Learning to a Generative Visual Dialog Model Jiasen Lu A. Kannan Jianwei Yang Devi Parikh Dhruv Batra BDL 102 137 0 05 Jun 2017
A simple neural network module for relational reasoning Adam Santoro David Raposo David Barrett Mateusz Malinowski Razvan Pascanu Peter W. Battaglia Timothy Lillicrap GNN NAI 191 1,617 0 05 Jun 2017
Deep learning evaluation using deep linguistic processing A. Kuhnle Ann A. Copestake ELM 59 11 0 05 Jun 2017
Listen, Interact and Talk: Learning to Speak via Interaction Haichao Zhang Haonan Yu Wenyuan Xu 77 13 0 28 May 2017
Multimodal Machine Learning: A Survey and Taxonomy T. Baltrušaitis Chaitanya Ahuja Louis-Philippe Morency 175 2,963 0 26 May 2017
Bidirectional Beam Search: Forward-Backward Inference in Neural Sequence Models for Fill-in-the-Blank Image Captioning Q. Sun Stefan Lee Dhruv Batra BDL 84 44 0 24 May 2017
Learning Convolutional Text Representations for Visual Question Answering Zhengyang Wang Shuiwang Ji FAtt 71 15 0 18 May 2017
MUTAN: Multimodal Tucker Fusion for Visual Question Answering H. Ben-younes Rémi Cadène Matthieu Cord Nicolas Thome 171 584 0 18 May 2017
ParlAI: A Dialog Research Software Platform Alexander H. Miller Will Feng Adam Fisch Jiasen Lu Dhruv Batra Antoine Bordes Devi Parikh Jason Weston 128 376 0 18 May 2017
Object-Level Context Modeling For Scene Classification with Context-CNN Syed Ashar Javed A. Nelakanti VLM 81 10 0 11 May 2017
Survey of Visual Question Answering: Datasets and Techniques A. Gupta 50 38 0 10 May 2017
Inferring and Executing Programs for Visual Reasoning Justin Johnson B. Hariharan Laurens van der Maaten Judy Hoffman Li Fei-Fei C. L. Zitnick Ross B. Girshick NAI 122 545 0 10 May 2017
Combating Human Trafficking with Deep Multimodal Models Edmund Tong Amir Zadeh Cara Jones Louis-Philippe Morency 82 51 0 08 May 2017
Supervised Learning of Universal Sentence Representations from Natural Language Inference Data Alexis Conneau Douwe Kiela Holger Schwenk Loïc Barrault Antoine Bordes AI4TS SSL 254 2,106 0 05 May 2017
ChestX-ray8: Hospital-scale Chest X-ray Database and Benchmarks on Weakly-Supervised Classification and Localization of Common Thorax Diseases Xiaosong Wang Yifan Peng Le Lu Zhiyong Lu M. Bagheri Ronald M. Summers LM&MA 261 2,558 0 05 May 2017
Weakly-supervised Visual Grounding of Phrases with Linguistic Structures Fanyi Xiao Leonid Sigal Yong Jae Lee 87 139 0 03 May 2017
FOIL it! Find One mismatch between Image and Language caption Ravi Shekhar Sandro Pezzelle Yauhen Klimovich Aurélie Herbelot Moin Nabi E. Sangineto Raffaella Bernardi 65 141 0 03 May 2017
The Forgettable-Watcher Model for Video Question Answering Hongyang Xue Zhou Zhao Deng Cai 43 9 0 03 May 2017
Show, Adapt and Tell: Adversarial Training of Cross-domain Image Captioner Tseng-Hung Chen Yuan-Hong Liao Ching-Yao Chuang W. Hsu Jianlong Fu Min Sun 105 142 0 02 May 2017
The Promise of Premise: Harnessing Question Premises in Visual Question Answering Aroma Mahendru Viraj Prabhu Akrit Mohapatra Dhruv Batra Stefan Lee NAI 108 38 0 01 May 2017
Speech-Based Visual Question Answering Ted Zhang Dengxin Dai Tinne Tuytelaars Marie-Francine Moens Luc Van Gool 85 25 0 01 May 2017
Mapping Instructions and Visual Observations to Actions with Reinforcement Learning Dipendra Kumar Misra John Langford Yoav Artzi 86 247 0 28 Apr 2017
C-VQA: A Compositional Split of the Visual Question Answering (VQA) v1.0 Dataset Aishwarya Agrawal Aniruddha Kembhavi Dhruv Batra Devi Parikh CoGe 70 80 0 26 Apr 2017
Paying Attention to Descriptions Generated by Image Captioning Models Hamed R. Tavakoli Rakshith Shetty Ali Borji Jorma T. Laaksonen 80 79 0 24 Apr 2017
Towards Instance Segmentation with Object Priority: Prominent Object Detection and Recognition Hamed R. Tavakoli Jorma T. Laaksonen 40 1 0 24 Apr 2017
An Analysis of Action Recognition Datasets for Language and Vision Tasks Spandana Gella Frank Keller ObjD 48 11 0 24 Apr 2017
Being Negative but Constructively: Lessons Learnt from Creating Better Visual Question Answering Datasets Wei-Lun Chao Hexiang Hu Fei Sha 89 37 0 24 Apr 2017
Learning to Reason: End-to-End Module Networks for Visual Question Answering Ronghang Hu Jacob Andreas Marcus Rohrbach Trevor Darrell Kate Saenko KELM GNN ReLM LRM 142 581 0 18 Apr 2017
AnchorNet: A Weakly Supervised Network to Learn Geometry-sensitive Features For Semantic Matching David Novotny Diane Larlus Andrea Vedaldi 3DPC 98 66 0 16 Apr 2017
Video Fill In the Blank using LR/RL LSTMs with Spatial-Temporal Attentions Amir Mazaheri Dong Zhang M. Shah 56 12 0 15 Apr 2017
ShapeWorld - A new test methodology for multimodal language understanding A. Kuhnle Ann A. Copestake 65 69 0 14 Apr 2017
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering Y. Jang Yale Song Youngjae Yu Youngjin Kim Gunhee Kim 93 562 0 14 Apr 2017
Spatial Memory for Context Reasoning in Object Detection Xinlei Chen Abhinav Gupta ObjD 101 166 0 13 Apr 2017
Explaining the Unexplained: A CLass-Enhanced Attentive Response (CLEAR) Approach to Understanding Deep Neural Networks Devinder Kumar Alexander Wong Graham W. Taylor 82 61 0 13 Apr 2017
Discriminative Bimodal Networks for Visual Localization and Detection with Natural Language Queries Y. Zhang Luyao Yuan Yijie Guo Zhiyuan He I-An Huang Honglak Lee ObjD 92 57 0 12 Apr 2017
What's in a Question: Using Visual Questions as a Form of Supervision Siddha Ganju Olga Russakovsky Abhinav Gupta 78 16 0 12 Apr 2017
Creativity: Generating Diverse Questions using Variational Autoencoders Unnat Jain Ziyu Zhang Alex Schwing 72 152 0 11 Apr 2017
Learning Two-Branch Neural Networks for Image-Text Matching Tasks Liwei Wang Yin Li Jing-ling Huang Svetlana Lazebnik VLM 110 498 0 11 Apr 2017