Localizing Moments in Video with Natural Language

4 August 2017

Papers citing "Localizing Moments in Video with Natural Language"

50 / 208 papers shown

Title
A Survey on Deep Learning Technique for Video Segmentation Tianfei Zhou Fatih Porikli David J. Crandall Luc Van Gool Wenguan Wang VOS 34 232 0 02 Jul 2021
Weakly Supervised Temporal Adjacent Network for Language Grounding Yuechen Wang Jiajun Deng Wen-gang Zhou Houqiang Li 26 67 0 30 Jun 2021
CLIP2Video: Mastering Video-Text Retrieval via Image CLIP Han Fang Pengfei Xiong Luhui Xu Yu Chen CLIP VLM 35 292 0 21 Jun 2021
Interventional Video Grounding with Dual Contrastive Learning Guoshun Nan Rui Qiao Yao Xiao Jun Liu Sicong Leng H. Zhang Wei Lu 26 144 0 21 Jun 2021
Parallel Attention Network with Sequence Matching for Video Grounding Hao Zhang Aixin Sun Wei Jing Liangli Zhen Qiufeng Wang Rick Siow Mong Goh 18 40 0 18 May 2021
Video Corpus Moment Retrieval with Contrastive Learning Hao Zhang Aixin Sun Wei Jing Guoshun Nan Liangli Zhen Qiufeng Wang Rick Siow Mong Goh 44 81 0 13 May 2021
SBNet: Segmentation-based Network for Natural Language-based Vehicle Search Sangrok Lee Taekang Woo Sang Hun Lee 24 4 0 22 Apr 2021
A Survey on Natural Language Video Localization Xinfang Liu Xiushan Nie Zhifang Tan Jie Guo Yilong Yin 28 7 0 01 Apr 2021
Decoupled Spatial Temporal Graphs for Generic Visual Grounding Qi Feng Yunchao Wei Mingming Cheng Yi Yang 27 5 0 18 Mar 2021
On Semantic Similarity in Video Retrieval Michael Wray Hazel Doughty Dima Damen 29 66 0 18 Mar 2021
Learning Temporal Dynamics from Cycles in Narrated Video Dave Epstein Jiajun Wu Cordelia Schmid Chen Sun AI4TS 33 14 0 07 Jan 2021
DVD: A Diagnostic Dataset for Multi-step Reasoning in Video Grounded Dialogue Hung Le Chinnadhurai Sankar Seungwhan Moon Ahmad Beirami A. Geramifard Satwik Kottur VGen 31 18 0 01 Jan 2021
Multi-Scale 2D Temporal Adjacent Networks for Moment Localization with Natural Language Songyang Zhang Houwen Peng Jianlong Fu Yijuan Lu Jiebo Luo 27 51 0 04 Dec 2020
WeaQA: Weak Supervision via Captions for Visual Question Answering Pratyay Banerjee Tejas Gokhale Yezhou Yang Chitta Baral 25 35 0 04 Dec 2020
Video Self-Stitching Graph Network for Temporal Action Localization Chen Zhao Ali K. Thabet Guohao Li 26 138 0 30 Nov 2020
VLG-Net: Video-Language Graph Matching Network for Video Grounding Mattia Soldan Mengmeng Xu Sisi Qu Jesper N. Tegnér Guohao Li 35 69 0 19 Nov 2020
Human-centric Spatio-Temporal Video Grounding With Visual Transformers Zongheng Tang Yue Liao Si Liu Guanbin Li Xiaojie Jin Hongxu Jiang Qian Yu Dong Xu 21 94 0 10 Nov 2020
COOT: Cooperative Hierarchical Transformer for Video-Text Representation Learning Simon Ging Mohammadreza Zolfaghari Hamed Pirsiavash Thomas Brox ViT CLIP 20 168 0 01 Nov 2020
What is More Likely to Happen Next? Video-and-Language Future Event Prediction Jie Lei Licheng Yu Tamara L. Berg Joey Tianyi Zhou 33 72 0 15 Oct 2020
DORi: Discovering Object Relationship for Moment Localization of a Natural-Language Query in Video Cristian Rodriguez-Opazo Edison Marrese-Taylor Basura Fernando Hongdong Li Stephen Gould 137 11 0 13 Oct 2020
Reinforcement Learning for Weakly Supervised Temporal Grounding of Natural Language in Untrimmed Videos Jie Wu Guanbin Li Xiaoguang Han Liang Lin OffRL AI4TS 19 56 0 18 Sep 2020
Uncovering Hidden Challenges in Query-Based Video Moment Retrieval Mayu Otani Yuta Nakashima Esa Rahtu J. Heikkilä 21 74 0 01 Sep 2020
VLANet: Video-Language Alignment Network for Weakly-Supervised Video Moment Retrieval Minuk Ma Sunjae Yoon Junyeong Kim Youngjoon Lee Sunghun Kang Chang D. Yoo 23 78 0 24 Aug 2020
Regularized Two-Branch Proposal Networks for Weakly-Supervised Moment Retrieval in Videos Zhu Zhang Zhijie Lin Zhou Zhao Jieming Zhu Xiuqiang He 14 69 0 19 Aug 2020
Jointly Cross- and Self-Modal Graph Attention Network for Query-Based Moment Localization Daizong Liu Xiaoye Qu Xiao-Yang Liu Jianfeng Dong Pan Zhou Zichuan Xu 33 129 0 04 Aug 2020
Enriching Video Captions With Contextual Text Philipp Rimle Pelin Dogan Markus Gross 30 3 0 29 Jul 2020
Learning Modality Interaction for Temporal Sentence Localization and Event Captioning in Videos Shaoxiang Chen Wenhao Jiang Wei Liu Yu-Gang Jiang 25 101 0 28 Jul 2020
Dense-Caption Matching and Frame-Selection Gating for Temporal Localization in VideoQA Hyounghun Kim Zineng Tang Joey Tianyi Zhou 30 31 0 13 May 2020
Condensed Movies: Story Based Retrieval with Contextual Embeddings Max Bain Arsha Nagrani A. Brown Andrew Zisserman 39 100 0 08 May 2020
Learning to Segment Actions from Observation and Narration Daniel Fried Jean-Baptiste Alayrac Phil Blunsom Chris Dyer S. Clark Aida Nematzadeh 33 31 0 07 May 2020
HERO: Hierarchical Encoder for Video+Language Omni-representation Pre-training Linjie Li Yen-Chun Chen Yu Cheng Zhe Gan Licheng Yu Jingjing Liu MLLM VLM OffRL AI4TS 43 493 0 01 May 2020
Span-based Localizing Network for Natural Language Video Localization Hao Zhang Aixin Sun Wei Jing Qiufeng Wang 32 312 0 29 Apr 2020
Local-Global Video-Text Interactions for Temporal Grounding Jonghwan Mun Minsu Cho Bohyung Han 36 267 0 16 Apr 2020
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval Jie Lei Licheng Yu Tamara L. Berg Joey Tianyi Zhou 119 275 0 24 Jan 2020
Tree-Structured Policy based Progressive Reinforcement Learning for Temporally Language Grounding in Video Jie Wu Guanbin Li Si Liu Liang Lin OffRL 20 104 0 18 Jan 2020
Action Modifiers: Learning from Adverbs in Instructional Videos Hazel Doughty Ivan Laptev W. Mayol-Cuevas Dima Damen 15 30 0 13 Dec 2019
Weakly-Supervised Video Moment Retrieval via Semantic Completion Network Zhijie Lin Zhou Zhao Zhu Zhang Qi. Wang Huasheng Liu 22 149 0 19 Nov 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos Yitian Yuan Lin Ma Jingwen Wang Wei Liu Wenwu Zhu 30 242 0 31 Oct 2019
A Graph-Based Framework to Bridge Movies and Synopses Yu Xiong Chengyi Zhang Lingfeng Guo Hang Zhou Bolei Zhou Dahua Lin 27 60 0 24 Oct 2019
CATER: A diagnostic dataset for Compositional Actions and TEmporal Reasoning Rohit Girdhar Deva Ramanan 19 176 0 10 Oct 2019
Rekall: Specifying Video Events using Compositions of Spatiotemporal Labels Daniel Y. Fu Will Crichton James Hong Xinwei Yao Haotian Zhang A. Truong A. Narayan Maneesh Agrawala Christopher Ré Kayvon Fatahalian 19 48 0 07 Oct 2019
LoGAN: Latent Graph Co-Attention Network for Weakly-Supervised Video Moment Retrieval Reuben Tan Huijuan Xu Kate Saenko Bryan A. Plummer 28 67 0 27 Sep 2019
Proposal-free Temporal Moment Localization of a Natural-Language Query in Video using Guided Attention Cristian Rodriguez-Opazo Edison Marrese-Taylor F. Saleh Hongdong Li Stephen Gould 24 147 0 20 Aug 2019
Exploiting Temporal Relationships in Video Moment Localization with Natural Language Songyang Zhang Jinsong Su Jiebo Luo 12 74 0 11 Aug 2019
Use What You Have: Video Retrieval Using Representations From Collaborative Experts Yang Liu Samuel Albanie Arsha Nagrani Andrew Zisserman 36 387 0 31 Jul 2019
Trends in Integration of Vision and Language Research: A Survey of Tasks, Datasets, and Methods Aditya Mogadala M. Kalimuthu Dietrich Klakow VLM 20 132 0 22 Jul 2019
Localizing Unseen Activities in Video via Image Query Zhu Zhang Zhou Zhao Zhijie Lin Jingkuan Song Deng Cai ViT 21 13 0 28 Jun 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering Jie Lei Licheng Yu Tamara L. Berg Joey Tianyi Zhou 31 227 0 25 Apr 2019
VATEX: A Large-Scale, High-Quality Multilingual Dataset for Video-and-Language Research Xin Eric Wang Jiawei Wu Junkun Chen Lei Li Yuan-fang Wang William Yang Wang 32 540 0 06 Apr 2019
Weakly Supervised Video Moment Retrieval From Text Queries Niluthpol Chowdhury Mithun S. Paul A. Roy-Chowdhury 30 193 0 05 Apr 2019