VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation

15 August 2017

Chuang Gan

Papers citing "VQS: Linking Segmentations to Questions and Answers for Supervised Attention in VQA and Question-Focused Semantic Segmentation"

43 / 43 papers shown

Title
Scene-Text Grounding for Text-Based Video Question Answering Sheng Zhou Junbin Xiao Xun Yang Peipei Song Dan Guo Angela Yao Meng Wang Tat-Seng Chua 242 1 0 22 Sep 2024
TextPSG: Panoptic Scene Graph Generation from Textual Descriptions Chengyang Zhao Songlin Yang Zhenfang Chen Mingyu Ding Chuang Gan 127 17 0 10 Oct 2023
Semantic Compositional Networks for Visual Captioning Zhe Gan Chuang Gan Xiaodong He Yunchen Pu Kenneth Tran Jianfeng Gao Lawrence Carin Li Deng CoGe 102 427 0 23 Nov 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 131 1,275 0 31 Jul 2016
SPICE: Semantic Propositional Image Caption Evaluation Peter Anderson Basura Fernando Mark Johnson Stephen Gould EGVM 108 1,919 0 29 Jul 2016
Human Attention in Visual Question Answering: Do Humans and Deep Networks Look at the Same Regions? Abhishek Das Harsh Agrawal C. L. Zitnick Devi Parikh Dhruv Batra 102 466 0 11 Jun 2016
Multimodal Compact Bilinear Pooling for Visual Question Answering and Visual Grounding Akira Fukui Dong Huk Park Daylen Yang Anna Rohrbach Trevor Darrell Marcus Rohrbach 310 1,466 0 06 Jun 2016
Multimodal Residual Learning for Visual QA Jin-Hwa Kim Sang-Woo Lee Donghyun Kwak Min-Oh Heo Jeonghee Kim Jung-Woo Ha Byoung-Tak Zhang 63 300 0 05 Jun 2016
Adversarial Feature Learning Jiasen Lu Philipp Krahenbuhl Trevor Darrell GAN 129 1,612 0 31 May 2016
Learning to Refine Object Segments Pedro H. O. Pinheiro Nayeon Lee R. Collobert Piotr Dollàr SSeg 69 854 0 29 Mar 2016
Segmentation from Natural Language Expressions Ronghang Hu Marcus Rohrbach Trevor Darrell VLM EgoV 76 437 0 20 Mar 2016
Dynamic Memory Networks for Visual and Textual Question Answering Caiming Xiong Stephen Merity R. Socher 77 756 0 04 Mar 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata ... Yannis Kalantidis Li Li David A. Shamma Michael S. Bernstein Fei-Fei Li 225 5,762 0 23 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 2.2K 194,426 0 10 Dec 2015
Where To Look: Focus Regions for Visual Question Answering Kevin J. Shih Saurabh Singh Derek Hoiem 76 460 0 23 Nov 2015
Learning Deep Structure-Preserving Image-Text Embeddings Liwei Wang Yin Li Svetlana Lazebnik 86 782 0 19 Nov 2015
Image Question Answering using Convolutional Neural Network with Dynamic Parameter Prediction Hyeonwoo Noh Paul Hongsuck Seo Bohyung Han OOD 78 327 0 18 Nov 2015
Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering Huijuan Xu Kate Saenko 79 763 0 17 Nov 2015
Natural Language Object Retrieval Ronghang Hu Huazhe Xu Marcus Rohrbach Jiashi Feng Kate Saenko Trevor Darrell ObjD 99 554 0 13 Nov 2015
Grounding of Textual Phrases in Images by Reconstruction Anna Rohrbach Marcus Rohrbach Ronghang Hu Trevor Darrell Bernt Schiele 80 497 0 12 Nov 2015
Visual7W: Grounded Question Answering in Images Yuke Zhu Oliver Groth Michael S. Bernstein Li Fei-Fei 104 887 0 11 Nov 2015
Neural Module Networks Jacob Andreas Marcus Rohrbach Trevor Darrell Dan Klein CoGe 139 1,076 0 09 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 133 1,357 0 07 Nov 2015
Stacked Attention Networks for Image Question Answering Zichao Yang Xiaodong He Jianfeng Gao Li Deng Alex Smola BDL 114 1,884 0 07 Nov 2015
Automatic Concept Discovery from Parallel Text and Visual Corpora Chen Sun Chuang Gan Ram Nevatia CoGe 42 107 0 24 Sep 2015
What value do explicit high level concepts have in vision to language problems? Qi Wu Chunhua Shen Lingqiao Liu A. Dick Anton Van Den Hengel 77 444 0 03 Jun 2015
Are You Talking to a Machine? Dataset and Methods for Multilingual Image Question Answering Haoyuan Gao Junhua Mao Jie Zhou Zhiheng Huang Lei Wang Wenyuan Xu 78 501 0 21 May 2015
Flickr30k Entities: Collecting Region-to-Phrase Correspondences for Richer Image-to-Sentence Models Bryan A. Plummer Liwei Wang Christopher M. Cervantes Juan C. Caicedo Julia Hockenmaier Svetlana Lazebnik 208 2,072 0 19 May 2015
Learning Deconvolution Network for Semantic Segmentation Hyeonwoo Noh Seunghoon Hong Bohyung Han SSeg 235 4,180 0 17 May 2015
Exploring Models and Data for Image Question Answering Mengye Ren Ryan Kiros R. Zemel 80 718 0 08 May 2015
Jointly Modeling Embedding and Translation to Bridge Video and Language Yingwei Pan Tao Mei Ting Yao Houqiang Li Y. Rui 83 534 0 07 May 2015
VQA: Visual Question Answering Aishwarya Agrawal Jiasen Lu Stanislaw Antol Margaret Mitchell C. L. Zitnick Dhruv Batra Devi Parikh CoGe 226 5,503 0 03 May 2015
Show, Attend and Tell: Neural Image Caption Generation with Visual Attention Ke Xu Jimmy Ba Ryan Kiros Kyunghyun Cho Aaron Courville Ruslan Salakhutdinov R. Zemel Yoshua Bengio DiffM 350 10,079 0 10 Feb 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 2.0K 150,312 0 22 Dec 2014
Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) Junhua Mao Wenyuan Xu Yi Yang Jiang Wang Zhiheng Huang Alan Yuille VLM 178 1,240 0 20 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 144 5,591 0 07 Dec 2014
CIDEr: Consensus-based Image Description Evaluation Ramakrishna Vedantam C. L. Zitnick Devi Parikh 297 4,508 0 20 Nov 2014
Show and Tell: A Neural Image Caption Generator Oriol Vinyals Alexander Toshev Samy Bengio D. Erhan 3DV 260 6,035 0 17 Nov 2014
Long-term Recurrent Convolutional Networks for Visual Recognition and Description Jeff Donahue Lisa Anne Hendricks Marcus Rohrbach Subhashini Venugopalan S. Guadarrama Kate Saenko Trevor Darrell VLM 165 6,056 0 17 Nov 2014
A Multi-World Approach to Question Answering about Real-World Scenes based on Uncertain Input Mateusz Malinowski Mario Fritz 220 698 0 01 Oct 2014
Very Deep Convolutional Networks for Large-Scale Image Recognition Karen Simonyan Andrew Zisserman FAtt MDE 1.7K 100,508 0 04 Sep 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 426 43,814 0 01 May 2014
Distributed Representations of Words and Phrases and their Compositionality Tomas Mikolov Ilya Sutskever Kai Chen G. Corrado J. Dean NAI OCL 402 33,560 0 16 Oct 2013