Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.13931
Cited By
Span-based Localizing Network for Natural Language Video Localization
29 April 2020
Hao Zhang
Aixin Sun
Wei Jing
Qiufeng Wang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Span-based Localizing Network for Natural Language Video Localization"
50 / 179 papers shown
Title
UniMD: Towards Unifying Moment Retrieval and Temporal Action Detection
Yingsen Zeng
Yujie Zhong
Chengjian Feng
Lin Ma
63
7
0
07 Apr 2024
SnAG: Scalable and Accurate Video Grounding
Fangzhou Mu
Sicheng Mo
Yin Li
42
8
0
02 Apr 2024
R
2
R^2
R
2
-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
41
13
0
31 Mar 2024
Unified Static and Dynamic Network: Efficient Temporal Filtering for Video Grounding
Jingjing Hu
Dan Guo
Kun Li
Zhan Si
Xun Yang
Xiaojun Chang
Meng Wang
61
3
0
21 Mar 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
46
55
0
18 Mar 2024
Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding
Chaolei Tan
Jian-Huang Lai
Wei-Shi Zheng
Jianfang Hu
AI4TS
41
5
0
18 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
34
30
0
15 Mar 2024
Improving Video Corpus Moment Retrieval with Partial Relevance Enhancement
Danyang Hou
Liang Pang
Huawei Shen
Xueqi Cheng
29
3
0
21 Feb 2024
Event-aware Video Corpus Moment Retrieval
Danyang Hou
Liang Pang
Huawei Shen
Xueqi Cheng
28
1
0
21 Feb 2024
Momentor: Advancing Video Large Language Model with Fine-Grained Temporal Reasoning
Long Qian
Juncheng Billy Li
Yu-hao Wu
Yaobo Ye
Hao Fei
Tat-Seng Chua
Yueting Zhuang
Siliang Tang
MLLM
LRM
60
47
0
18 Feb 2024
InternEvo: Efficient Long-sequence Large Language Model Training via Hybrid Parallelism and Redundant Sharding
Qiaoling Chen
Diandian Gu
Guoteng Wang
Xun Chen
Yingtong Xiong
...
Qi Hu
Xin Jin
Yonggang Wen
Tianwei Zhang
Peng Sun
49
8
0
17 Jan 2024
Multi-scale 2D Temporal Map Diffusion Models for Natural Language Video Localization
Chongzhi Zhang
Mingyuan Zhang
Zhiyang Teng
Jiayi Li
Xizhou Zhu
Lewei Lu
Ziwei Liu
Aixin Sun
DiffM
VGen
18
0
0
16 Jan 2024
Bias-Conflict Sample Synthesis and Adversarial Removal Debias Strategy for Temporal Sentence Grounding in Video
Zhaobo Qi
Yibo Yuan
Xiaowen Ruan
Shuhui Wang
Weigang Zhang
Qingming Huang
29
6
0
15 Jan 2024
Detours for Navigating Instructional Videos
Kumar Ashutosh
Zihui Xue
Tushar Nagarajan
Kristen Grauman
29
6
0
03 Jan 2024
Multi-Modal Domain Adaptation Across Video Scenes for Temporal Video Grounding
Haifeng Huang
Yang Zhao
Zehan Wang
Yan Xia
Zhou Zhao
33
1
0
21 Dec 2023
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu
Jun Li
Hongtao Xie
Pandeng Li
Jiannan Ge
Sun-Ao Liu
Guoqing Jin
42
18
0
19 Dec 2023
Grounded Question-Answering in Long Egocentric Videos
Shangzhe Di
Weidi Xie
37
23
0
11 Dec 2023
RGNet: A Unified Clip Retrieval and Grounding Network for Long Videos
Tanveer Hannan
Md. Mohaiminul Islam
Thomas Seidl
Gedas Bertasius
28
3
0
11 Dec 2023
LifelongMemory: Leveraging LLMs for Answering Queries in Long-form Egocentric Videos
Ying Wang
Yanlai Yang
Mengye Ren
41
15
0
07 Dec 2023
DemaFormer: Damped Exponential Moving Average Transformer with Energy-Based Modeling for Temporal Language Grounding
Thong Nguyen
Xiaobao Wu
Xinshuai Dong
Cong-Duy Nguyen
See-Kiong Ng
Anh Tuan Luu
29
7
0
05 Dec 2023
VTimeLLM: Empower LLM to Grasp Video Moments
Bin Huang
Xin Wang
Hong Chen
Zihan Song
Wenwu Zhu
MLLM
89
113
0
30 Nov 2023
BAM-DETR: Boundary-Aligned Moment Detection Transformer for Temporal Sentence Grounding in Videos
Pilhyeon Lee
Hyeran Byun
19
10
0
30 Nov 2023
Correlation-Guided Query-Dependency Calibration for Video Temporal Grounding
WonJun Moon
Sangeek Hyun
Subeen Lee
Jae-Pil Heo
29
4
0
15 Nov 2023
Exploring Iterative Refinement with Diffusion Models for Video Grounding
Xiao Liang
Tao Shi
Yaoyuan Liang
Te Tao
Shao-Lun Huang
DiffM
29
1
0
26 Oct 2023
VidChapters-7M: Video Chapters at Scale
Antoine Yang
Arsha Nagrani
Ivan Laptev
Josef Sivic
Cordelia Schmid
VGen
23
26
0
25 Sep 2023
Towards Answering Health-related Questions from Medical Videos: Datasets and Approaches
Deepak Gupta
Kush Attal
Dina Demner-Fushman
LM&MA
19
1
0
21 Sep 2023
Zero-Shot Video Moment Retrieval from Frozen Vision-Language Models
Dezhao Luo
Jiabo Huang
Shaogang Gong
Hailin Jin
Yang Liu
VLM
21
9
0
01 Sep 2023
DiffusionVMR: Diffusion Model for Joint Video Moment Retrieval and Highlight Detection
Henghao Zhao
Kevin Qinghong Lin
Rui Yan
Zechao Li
VGen
DiffM
37
1
0
29 Aug 2023
UnLoc: A Unified Framework for Video Localization Tasks
Shengjia Yan
Xuehan Xiong
Arsha Nagrani
Anurag Arnab
Zhonghao Wang
Weina Ge
David A. Ross
Cordelia Schmid
31
53
0
21 Aug 2023
Temporal Sentence Grounding in Streaming Videos
Tian Gan
Xiao Wang
Yan Sun
Jianlong Wu
Qingpei Guo
Liqiang Nie
43
2
0
14 Aug 2023
Knowing Where to Focus: Event-aware Transformer for Video Grounding
Jinhyun Jang
Jungin Park
Jin-Hwa Kim
Hyeongjun Kwon
Kwanghoon Sohn
24
49
0
14 Aug 2023
ViGT: Proposal-free Video Grounding with Learnable Token in Transformer
Kun Li
Dan Guo
Meng Wang
ViT
14
36
0
11 Aug 2023
Counterfactual Cross-modality Reasoning for Weakly Supervised Video Moment Localization
Zezhong Lv
Bing-Huang Su
Ji-Rong Wen
16
16
0
10 Aug 2023
Efficient Temporal Sentence Grounding in Videos with Multi-Teacher Knowledge Distillation
Renjie Liang
Yiming Yang
Hui Lu
Li Li
22
10
0
07 Aug 2023
UniVTG: Towards Unified Video-Language Temporal Grounding
Kevin Qinghong Lin
Pengchuan Zhang
Joya Chen
Shraman Pramanick
Difei Gao
Alex Jinpeng Wang
Rui Yan
Mike Zheng Shou
26
112
0
31 Jul 2023
G2L: Semantically Aligned and Uniform Video Grounding via Geodesic and Game Theory
Hongxiang Li
Meng Cao
Xuxin Cheng
Yaowei Li
Zhihong Zhu
Yuexian Zou
24
20
0
26 Jul 2023
No-frills Temporal Video Grounding: Multi-Scale Neighboring Attention and Zoom-in Boundary Detection
Qi Zhang
S. Zheng
Qin Jin
17
1
0
20 Jul 2023
EgoVLPv2: Egocentric Video-Language Pre-training with Fusion in the Backbone
Shraman Pramanick
Yale Song
Sayan Nag
Kevin Qinghong Lin
Hardik Shah
Mike Zheng Shou
Ramalingam Chellappa
Pengchuan Zhang
VLM
39
87
0
11 Jul 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
P. Li
Chen-Wei Xie
Hongtao Xie
Liming Zhao
Lei Zhang
Yun Zheng
Deli Zhao
Yongdong Zhang
DiffM
VGen
36
56
0
06 Jul 2023
Look, Remember and Reason: Grounded reasoning in videos with language models
Apratim Bhattacharyya
Sunny Panchal
Mingu Lee
Reza Pourreza
Pulkit Madan
Roland Memisevic
LRM
33
7
0
30 Jun 2023
SpotEM: Efficient Video Search for Episodic Memory
Santhosh Kumar Ramakrishnan
Ziad Al-Halah
Kristen Grauman
VLM
28
9
0
28 Jun 2023
GroundNLQ @ Ego4D Natural Language Queries Challenge 2023
Zhijian Hou
Lei Ji
Difei Gao
Wanjun Zhong
Kun Yan
Chong Li
W. Chan
Chong-Wah Ngo
Nan Duan
Mike Zheng Shou
20
15
0
27 Jun 2023
Action Sensitivity Learning for the Ego4D Episodic Memory Challenge 2023
Jiayi Shao
Xiaohan Wang
Ruijie Quan
Yezhou Yang
EgoV
19
8
0
15 Jun 2023
A Survey on Video Moment Localization
Meng Liu
Liqiang Nie
Yunxiao Wang
Meng Wang
Yong Rui
27
28
0
13 Jun 2023
MS-DETR: Natural Language Video Localization with Sampling Moment-Moment Interaction
J. Wang
Aixin Sun
Hao Zhang
Xiaoli Li
ViT
19
13
0
30 May 2023
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
103
76
0
22 May 2023
Transform-Equivariant Consistency Learning for Temporal Sentence Grounding
Daizong Liu
Xiaoye Qu
Jianfeng Dong
Pan Zhou
Zichuan Xu
Yining Qi
Xing Di
Weining Lu
Yu Cheng
46
8
0
06 May 2023
MH-DETR: Video Moment and Highlight Detection with Cross-modal Transformer
Yifang Xu
Yunzhuo Sun
Yang Li
Yilei Shi
Xiaoxia Zhu
S. Du
ViT
42
33
0
29 Apr 2023
Query-Dependent Video Representation for Moment Retrieval and Highlight Detection
WonJun Moon
Sangeek Hyun
S. Park
Dongchan Park
Jae-Pil Heo
ViT
53
106
0
24 Mar 2023
Scanning Only Once: An End-to-end Framework for Fast Temporal Grounding in Long Videos
Yulin Pan
Xiangteng He
Biao Gong
Yiliang Lv
Yujun Shen
Yuxin Peng
Deli Zhao
37
12
0
15 Mar 2023
Previous
1
2
3
4
Next