Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1708.01641
Cited By
Localizing Moments in Video with Natural Language
4 August 2017
Lisa Anne Hendricks
Oliver Wang
Eli Shechtman
Josef Sivic
Trevor Darrell
Bryan C. Russell
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Localizing Moments in Video with Natural Language"
50 / 211 papers shown
Title
MultiVENT: Multilingual Videos of Events with Aligned Natural Text
Kate Sanders
David Etter
Reno Kriz
Benjamin Van Durme
VGen
42
7
0
06 Jul 2023
A Survey on Video Moment Localization
Meng Liu
Liqiang Nie
Yunxiao Wang
Meng Wang
Yong Rui
29
28
0
13 Jun 2023
VLAB: Enhancing Video Language Pre-training by Feature Adapting and Blending
Xingjian He
Sihan Chen
Fan Ma
Zhicheng Huang
Xiaojie Jin
Zikang Liu
Dongmei Fu
Yi Yang
Jiaheng Liu
Jiashi Feng
VLM
CLIP
23
17
0
22 May 2023
Text-Video Retrieval with Disentangled Conceptualization and Set-to-Set Alignment
Peng Jin
Hao Li
Ze-Long Cheng
Jinfa Huang
Zhennan Wang
Li-ming Yuan
Chang-rui Liu
Jie Chen
38
32
0
20 May 2023
SViTT: Temporal Learning of Sparse Video-Text Transformers
Yi Li
Kyle Min
Subarna Tripathi
Nuno Vasconcelos
28
12
0
18 Apr 2023
Vita-CLIP: Video and text adaptive CLIP via Multimodal Prompting
Syed Talal Wasim
Muzammal Naseer
Salman Khan
Fahad Shahbaz Khan
M. Shah
VLM
VPVLM
39
74
0
06 Apr 2023
Sketch-based Video Object Localization
Sangmin Woo
So-Yeong Jeon
Jinyoung Park
Minji Son
Sumin Lee
Changick Kim
19
0
0
02 Apr 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Joey Tianyi Zhou
3DV
20
51
0
29 Mar 2023
Unmasked Teacher: Towards Training-Efficient Video Foundation Models
Kunchang Li
Yali Wang
Yizhuo Li
Yi Wang
Yinan He
Limin Wang
Yu Qiao
VGen
57
156
0
28 Mar 2023
Video-Text as Game Players: Hierarchical Banzhaf Interaction for Cross-Modal Representation Learning
Peng Jin
Jinfa Huang
Pengfei Xiong
Shangxuan Tian
Chang-rui Liu
Xiang Ji
Li-ming Yuan
Jie Chen
45
50
0
25 Mar 2023
DiffusionRet: Generative Text-Video Retrieval with Diffusion Model
Peng Jin
Hao Li
Ze-Long Cheng
Kehan Li
Xiang Ji
Chang-rui Liu
Li-ming Yuan
Jie Chen
DiffM
VGen
28
54
0
17 Mar 2023
STOA-VLP: Spatial-Temporal Modeling of Object and Action for Video-Language Pre-training
Weihong Zhong
Mao Zheng
Duyu Tang
Xuan Luo
Heng Gong
Xiaocheng Feng
Bing Qin
32
8
0
20 Feb 2023
Interactive Video Corpus Moment Retrieval using Reinforcement Learning
Zhixin Ma
Chong-Wah Ngo
33
3
0
19 Feb 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
26
7
0
16 Feb 2023
Tagging before Alignment: Integrating Multi-Modal Tags for Video-Text Retrieval
Yizhen Chen
Jie Wang
Lijian Lin
Zhongang Qi
Jin Ma
Ying Shan
VLM
30
18
0
30 Jan 2023
Temporal Perceiving Video-Language Pre-training
Fan Ma
Xiaojie Jin
Heng Wang
Jingjia Huang
Linchao Zhu
Jiashi Feng
Yi Yang
VLM
32
15
0
18 Jan 2023
UATVR: Uncertainty-Adaptive Text-Video Retrieval
Bo Fang
Wenhao Wu
Chang-rui Liu
Yu Zhou
Yuxin Song
Weiping Wang
Min Yang
Xiang Ji
Jingdong Wang
26
46
0
16 Jan 2023
Learning Trajectory-Word Alignments for Video-Language Tasks
Xu Yang
Zhang Li
Haiyang Xu
Hanwang Zhang
Qinghao Ye
Chenliang Li
Ming Yan
Yu Zhang
Fei Huang
Songfang Huang
33
7
0
05 Jan 2023
Hypotheses Tree Building for One-Shot Temporal Sentence Localization
Daizong Liu
Xiang Fang
Pan Zhou
Xing Di
Weining Lu
Yu Cheng
32
19
0
05 Jan 2023
Rethinking the Video Sampling and Reasoning Strategies for Temporal Sentence Grounding
Jiahao Zhu
Daizong Liu
Pan Zhou
Xing Di
Yu Cheng
...
Wenzheng Xu
Zichuan Xu
Yao Wan
Lichao Sun
Zeyu Xiong
27
18
0
02 Jan 2023
MRTNet: Multi-Resolution Temporal Network for Video Sentence Grounding
Wei Ji
Long Chen
Yin-wei Wei
Yiming Wu
Tat-Seng Chua
AI4TS
35
18
0
26 Dec 2022
Re-evaluating the Need for Multimodal Signals in Unsupervised Grammar Induction
Boyi Li
Rodolfo Corona
K. Mangalam
Catherine Chen
Daniel Flaherty
Serge Belongie
Kilian Q. Weinberger
Jitendra Malik
Trevor Darrell
Dan Klein
21
1
0
20 Dec 2022
InternVideo: General Video Foundation Models via Generative and Discriminative Learning
Yi Wang
Kunchang Li
Yizhuo Li
Yinan He
Bingkun Huang
...
Junting Pan
Jiashuo Yu
Yali Wang
Limin Wang
Yu Qiao
VLM
VGen
57
311
0
06 Dec 2022
Masked Contrastive Pre-Training for Efficient Video-Text Retrieval
Fangxun Shu
Biaolong Chen
Yue Liao
Shuwen Xiao
Wenyu Sun
Xiaobo Li
Yousong Zhu
Jinqiao Wang
Si Liu
CLIP
27
11
0
02 Dec 2022
Re^2TAL: Rewiring Pretrained Video Backbones for Reversible Temporal Action Localization
Chen Zhao
Shuming Liu
K. Mangalam
Guohao Li
38
17
0
25 Nov 2022
Make-A-Story: Visual Memory Conditioned Consistent Story Generation
Tanzila Rahman
Hsin-Ying Lee
Jian Ren
Sergey Tulyakov
Shweta Mahajan
Leonid Sigal
DiffM
19
68
0
23 Nov 2022
VoP: Text-Video Co-operative Prompt Tuning for Cross-Modal Retrieval
Siteng Huang
Biao Gong
Yulin Pan
Jianwen Jiang
Yiliang Lv
Yuyuan Li
Donglin Wang
VLM
VPVLM
22
41
0
23 Nov 2022
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Joey Tianyi Zhou
VLM
24
9
0
21 Nov 2022
SMAUG: Sparse Masked Autoencoder for Efficient Video-Language Pre-training
Yuanze Lin
Chen Wei
Huiyu Wang
Alan Yuille
Cihang Xie
3DGS
34
15
0
21 Nov 2022
Cross-Modal Adapter for Text-Video Retrieval
Haojun Jiang
Jianke Zhang
Rui Huang
Chunjiang Ge
Zanlin Ni
Jiwen Lu
Jie Zhou
S. Song
Gao Huang
48
36
0
17 Nov 2022
Watching the News: Towards VideoQA Models that can Read
Soumya Jahagirdar
Minesh Mathew
Dimosthenis Karatzas
C. V. Jawahar
27
18
0
10 Nov 2022
Unsupervised Audio-Visual Lecture Segmentation
Darshan Singh
Anchit Gupta
C. V. Jawahar
Makarand Tapaswi
VOS
24
4
0
29 Oct 2022
Modal-specific Pseudo Query Generation for Video Corpus Moment Retrieval
Minjoon Jung
Seongho Choi
Joo-Kyung Kim
Jin-Hwa Kim
Byoung-Tak Zhang
36
7
0
23 Oct 2022
Learning a Grammar Inducer from Massive Uncurated Instructional Videos
Songyang Zhang
Linfeng Song
Lifeng Jin
Haitao Mi
Kun Xu
Dong Yu
Jiebo Luo
38
5
0
22 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
29
18
0
21 Oct 2022
Selective Query-guided Debiasing for Video Corpus Moment Retrieval
Sunjae Yoon
Jiajing Hong
Eunseop Yoon
Dahyun Kim
Junyeong Kim
Hee Suk Yoon
Changdong Yoo
41
21
0
17 Oct 2022
Semantic Video Moments Retrieval at Scale: A New Task and a Baseline
Na Li
26
0
0
15 Oct 2022
Long-Form Video-Language Pre-Training with Multimodal Temporal Contrastive Learning
Yuchong Sun
Hongwei Xue
Ruihua Song
Bei Liu
Huan Yang
Jianlong Fu
AI4TS
VLM
20
68
0
12 Oct 2022
Locate before Answering: Answer Guided Question Localization for Video Question Answering
Tianwen Qian
Ran Cui
Jingjing Chen
Pai Peng
Xiao-Wei Guo
Yu-Gang Jiang
34
17
0
05 Oct 2022
Towards Parameter-Efficient Integration of Pre-Trained Language Models In Temporal Video Grounding
Erica K. Shimomoto
Edison Marrese-Taylor
Hiroya Takamura
Ichiro Kobayashi
Hideki Nakayama
Yusuke Miyao
27
7
0
26 Sep 2022
WildQA: In-the-Wild Video Question Answering
Santiago Castro
Naihao Deng
Pingxuan Huang
Mihai Burzo
Rada Mihalcea
74
7
0
14 Sep 2022
CLIP-ViP: Adapting Pre-trained Image-Text Model to Video-Language Representation Alignment
Hongwei Xue
Yuchong Sun
Bei Liu
Jianlong Fu
Rui Song
Houqiang Li
Jiebo Luo
CLIP
VLM
25
68
0
14 Sep 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
26
64
0
04 Sep 2022
Hierarchical Local-Global Transformer for Temporal Sentence Grounding
Xiang Fang
Daizong Liu
Pan Zhou
Zichuan Xu
Rui Li
22
28
0
31 Aug 2022
Partially Relevant Video Retrieval
Jianfeng Dong
Xianke Chen
Minsong Zhang
Xun Yang
Shujie Chen
Xirong Li
Xun Wang
17
39
0
26 Aug 2022
MuMUR : Multilingual Multimodal Universal Retrieval
Avinash Madasu
Estelle Aflalo
Gabriela Ben-Melech Stan
Shachar Rosenman
Shao-Yen Tseng
Gedas Bertasius
Vasudev Lal
44
3
0
24 Aug 2022
M2HF: Multi-level Multi-modal Hybrid Fusion for Text-Video Retrieval
Shuo Liu
Weize Quan
Mingyuan Zhou
Sihong Chen
Jian Kang
Zhenlan Zhao
Chen Chen
Dong-Ming Yan
28
0
0
16 Aug 2022
Exploring Anchor-based Detection for Ego4D Natural Language Query
S. Zheng
Qi Zhang
Bei Liu
Qingyu Jin
Jianlong Fu
EgoV
11
4
0
10 Aug 2022
Video Question Answering with Iterative Video-Text Co-Tokenization
A. Piergiovanni
K. Morton
Weicheng Kuo
Michael S. Ryoo
A. Angelova
31
18
0
01 Aug 2022
Reducing the Vision and Language Bias for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Wei Hu
19
49
0
27 Jul 2022
Previous
1
2
3
4
5
Next