Visual Relation Grounding in Videos

17 July 2020

Papers citing "Visual Relation Grounding in Videos"

35 / 35 papers shown

Title
Scene-Text Grounding for Text-Based Video Question Answering Sheng Zhou Junbin Xiao Xun Yang Peipei Song Dan Guo Angela Yao Meng Wang Tat-Seng Chua 199 1 0 22 Sep 2024
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences Zhu Zhang Zhou Zhao Yang Zhao Qi. Wang Huasheng Liu Lianli Gao 60 115 0 19 Jan 2020
Adaptive Reconstruction Network for Weakly Supervised Referring Expression Grounding Xuejing Liu Liang Li Shuhui Wang Zhengjun Zha Dechao Meng Qingming Huang ObjD 41 78 0 28 Aug 2019
Weakly-Supervised Spatio-Temporally Grounding Natural Sentence in Video Zhenfang Chen Lin Ma Wenhan Luo Kwan-Yee K. Wong 78 103 0 06 Jun 2019
Video Relationship Reasoning using Gated Spatio-Temporal Energy Graph Yao-Hung Hubert Tsai S. Divvala Louis-Philippe Morency Ruslan Salakhutdinov Ali Farhadi 51 103 0 25 Mar 2019
Videos as Space-Time Region Graphs Xinyu Wang Abhinav Gupta 83 755 0 05 Jun 2018
R-VQA: Learning Visual Relation Facts with Semantic Attention for Visual Question Answering Pan Lu Lei Ji Wei Zhang Nan Duan M. Zhou Jianyong Wang CoGe 51 79 0 24 May 2018
Weakly-Supervised Video Object Grounding from Text by Loss Weighting and Object Interaction Luowei Zhou Nathan Louis Jason J. Corso 77 94 0 08 May 2018
Large-Scale Visual Relationship Understanding Ji Zhang Yannis Kalantidis Marcus Rohrbach Manohar Paluri Ahmed Elgammal Mohamed Elhoseiny 45 168 0 27 Apr 2018
Referring Relationships Ranjay Krishna Ines Chami Michael S. Bernstein Li Fei-Fei 59 94 0 28 Mar 2018
MAttNet: Modular Attention Network for Referring Expression Comprehension Licheng Yu Zhe Lin Xiaohui Shen Jimei Yang Xin Lu Joey Tianyi Zhou Tamara L. Berg ObjD 97 825 0 24 Jan 2018
Object Referring in Videos with Language and Human Gaze A. Vasudevan Dengxin Dai Luc Van Gool VOS 53 75 0 04 Jan 2018
Grounding Referring Expressions in Images by Variational Context Hanwang Zhang Yulei Niu Shih-Fu Chang BDL ObjD 53 220 0 05 Dec 2017
Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset João Carreira Andrew Zisserman 219 7,989 0 22 May 2017
Spatio-temporal Person Retrieval via Natural Language Queries Masataka Yamaguchi Kuniaki Saito Yoshitaka Ushiku Tatsuya Harada 64 58 0 26 Apr 2017
Modeling Relationships in Referential Expressions with Compositional Modular Networks Ronghang Hu Marcus Rohrbach Jacob Andreas Trevor Darrell Kate Saenko 73 406 0 30 Nov 2016
Modeling Context in Referring Expressions Licheng Yu Patrick Poirson Shan Yang Alexander C. Berg Tamara L. Berg 125 1,261 0 31 Jul 2016
Visual Relationship Detection with Language Priors Cewu Lu Ranjay Krishna Michael S. Bernstein Li Fei-Fei VLM 73 1,138 0 31 Jul 2016
Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations Ranjay Krishna Yuke Zhu Oliver Groth Justin Johnson Kenji Hata ... Yannis Kalantidis Li Li David A. Shamma Michael S. Bernstein Fei-Fei Li 194 5,726 0 23 Feb 2016
Deep Residual Learning for Image Recognition Kaiming He Xinming Zhang Shaoqing Ren Jian Sun MedIm 1.9K 193,426 0 10 Dec 2015
Structural-RNN: Deep Learning on Spatio-Temporal Graphs Ashesh Jain Amir Zamir Silvio Savarese Ashutosh Saxena GNN 128 1,090 0 17 Nov 2015
Natural Language Object Retrieval Ronghang Hu Huazhe Xu Marcus Rohrbach Jiashi Feng Kate Saenko Trevor Darrell ObjD 87 552 0 13 Nov 2015
Grounding of Textual Phrases in Images by Reconstruction Anna Rohrbach Marcus Rohrbach Ronghang Hu Trevor Darrell Bernt Schiele 73 496 0 12 Nov 2015
Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning Pingbo Pan Zhongwen Xu Yi Yang Leilei Gan Yueting Zhuang 43 385 0 11 Nov 2015
Generation and Comprehension of Unambiguous Object Descriptions Junhua Mao Jonathan Huang Alexander Toshev Oana-Maria Camburu Alan Yuille Kevin Patrick Murphy ObjD 112 1,345 0 07 Nov 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks Shaoqing Ren Kaiming He Ross B. Girshick Jian Sun AIMat ObjD 467 62,122 0 04 Jun 2015
Sequence to Sequence -- Video to Text Subhashini Venugopalan Marcus Rohrbach Jeff Donahue Raymond J. Mooney Trevor Darrell Kate Saenko 116 1,417 0 03 May 2015
Beyond Short Snippets: Deep Networks for Video Classification Joe Yue-Hei Ng Matthew J. Hausknecht Sudheendra Vijayanarasimhan Oriol Vinyals R. Monga G. Toderici 127 2,336 0 31 Mar 2015
Adam: A Method for Stochastic Optimization Diederik P. Kingma Jimmy Ba ODL 1.5K 149,842 0 22 Dec 2014
Deep Visual-Semantic Alignments for Generating Image Descriptions A. Karpathy Li Fei-Fei 95 5,578 0 07 Dec 2014
Finding Action Tubes Georgia Gkioxari Jitendra Malik 54 598 0 21 Nov 2014
ImageNet Large Scale Visual Recognition Challenge Olga Russakovsky Jia Deng Hao Su J. Krause S. Satheesh ... A. Karpathy A. Khosla Michael S. Bernstein Alexander C. Berg Li Fei-Fei VLM ObjD 1.5K 39,472 0 01 Sep 2014
Deep Fragment Embeddings for Bidirectional Image Sentence Mapping A. Karpathy Armand Joulin Li Fei-Fei VLM 85 936 0 22 Jun 2014
Two-Stream Convolutional Networks for Action Recognition in Videos Karen Simonyan Andrew Zisserman 237 7,526 0 09 Jun 2014
Microsoft COCO: Common Objects in Context Nayeon Lee Michael Maire Serge J. Belongie Lubomir Bourdev Ross B. Girshick James Hays Pietro Perona Deva Ramanan C. L. Zitnick Piotr Dollár ObjD 373 43,524 0 01 May 2014