Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1705.02101
Cited By
v1
v2 (latest)
TALL: Temporal Activity Localization via Language Query
5 May 2017
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"TALL: Temporal Activity Localization via Language Query"
50 / 433 papers shown
Title
DaMO: A Data-Efficient Multimodal Orchestrator for Temporal Reasoning with Video LLMs
Bo-Cheng Chiu
Jen-Jee Chen
Yu-Chee Tseng
Feng-Chi Chen
7
0
0
13 Jun 2025
A Shortcut-aware Video-QA Benchmark for Physical Understanding via Minimal Video Pairs
Benno Krojer
Mojtaba Komeili
Candace Ross
Q. Garrido
Koustuv Sinha
Nicolas Ballas
Mahmoud Assran
52
1
0
11 Jun 2025
MLVTG: Mamba-Based Feature Alignment and LLM-Driven Purification for Multi-Modal Video Temporal Grounding
Zhiyi Zhu
Xiaoyu Wu
Zihao Liu
Linlin Yang
27
0
0
10 Jun 2025
DeepVideo-R1: Video Reinforcement Fine-Tuning via Difficulty-aware Regressive GRPO
Jinyoung Park
Jeehye Na
Jinyoung Kim
H. Kim
OffRL
11
0
0
09 Jun 2025
MiMo-VL Technical Report
Xiaomi LLM-Core Team
Zihao Yue
Zhenru Lin
Yifan Song
Weikun Wang
...
Di Zhang
Chong Ma
Chang Liu
Can Cai
Bingquan Xia
OffRL
MoE
VLM
LRM
77
0
0
04 Jun 2025
MamFusion: Multi-Mamba with Temporal Fusion for Partially Relevant Video Retrieval
Xinru Ying
Jiaqi Mo
Jingyang Lin
Canghong Jin
Fangfang Wang
Lina Wei
64
0
0
04 Jun 2025
VidEvent: A Large Dataset for Understanding Dynamic Evolution of Events in Videos
Baoyu Liang
Qile Su
Shoutai Zhu
Yuchen Liang
Chao Tong
VGen
49
1
0
03 Jun 2025
VideoCap-R1: Enhancing MLLMs for Video Captioning via Structured Thinking
Desen Meng
Rui Huang
Zhilin Dai
Xinhao Li
Yifan Xu
...
Z. Huang
Meng Zhang
L. Zhang
Yi Liu
Limin Wang
OffRL
VLM
LRM
54
0
0
02 Jun 2025
LLM-powered Query Expansion for Enhancing Boundary Prediction in Language-driven Action Localization
Zirui Shang
Xinxiao Wu
Shuo Yang
42
0
0
30 May 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
Yingsen Zeng
Zepeng Huang
Yujie Zhong
Chengjian Feng
Jie Hu
Lin Ma
Yang Liu
VGen
17
0
0
30 May 2025
MUSEG: Reinforcing Video Temporal Understanding via Timestamp-Aware Multi-Segment Grounding
Fuwen Luo
Shengfeng Lou
C. L. Philip Chen
Ziyue Wang
Chenliang Li
...
Peng Li
Ming Yan
Ji Zhang
Fei Huang
Yang Liu
AI4TS
LRM
73
0
0
27 May 2025
Daily-Omni: Towards Audio-Visual Reasoning with Temporal Alignment across Modalities
Ziwei Zhou
Rui Wang
Zuxuan Wu
AuLLM
VGen
77
0
0
23 May 2025
DeCafNet: Delegate and Conquer for Efficient Temporal Grounding in Long Videos
Zijia Lu
A S M Iftekhar
Gaurav Mittal
Tianjian Meng
Xiawei Wang
Cheng Zhao
Rohith Kukkala
Ehsan Elhamifar
Mei Chen
72
0
0
22 May 2025
Weakly Supervised Temporal Sentence Grounding via Positive Sample Mining
Lu Dong
Han Zhang
Hongjie Zhang
Yuanmin Huang
Z. Ling
Yu Qiao
Limin Wang
Yun Wang
AI4TS
206
0
0
10 May 2025
StreamBridge: Turning Your Offline Video Large Language Model into a Proactive Streaming Assistant
Haibo Wang
Bo Feng
Zhengfeng Lai
Mingze Xu
Shiyu Li
Weifeng Ge
Afshin Dehghan
Meng Cao
Ping Huang
OffRL
139
0
0
08 May 2025
Object-Shot Enhanced Grounding Network for Egocentric Video
Yisen Feng
Haoyu Zhang
Meng Liu
Weili Guan
Liqiang Nie
80
3
0
07 May 2025
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Lei Li
AI4TS
385
2
0
02 May 2025
Exploiting Inter-Sample Correlation and Intra-Sample Redundancy for Partially Relevant Video Retrieval
Junlong Ren
Gangjian Zhang
Yitao Hu
Jian Shu
Haoran Wang
100
0
0
28 Apr 2025
Learning Streaming Video Representation via Multitask Training
Yibin Yan
Jilan Xu
Shangzhe Di
Yikun Liu
Yudi Shi
Qirui Chen
Zeqian Li
Yifei Huang
Weidi Xie
CLL
164
1
0
28 Apr 2025
Vidi: Large Multimodal Models for Video Understanding and Editing
Vidi Team
Celong Liu
Chia-Wen Kuo
Dawei Du
Fan Chen
...
Wen Zhong
Xiaohui Shen
Xin Gu
Xing Mei
Xueqiong Qu
106
0
0
22 Apr 2025
Grounding-MD: Grounded Video-language Pre-training for Open-World Moment Detection
Weijun Zhuang
Qizhang Li
Xin Li
Ming-Yu Liu
Xiaopeng Hong
Feng Gao
Fan Yang
W. Zuo
79
0
0
20 Apr 2025
Prototypes are Balanced Units for Efficient and Effective Partially Relevant Video Retrieval
WonJun Moon
Cheol-Ho Cho
Woojin Jun
Minho Shim
Taeoh Kim
Inwoong Lee
Dongyoon Wee
Jae-Pil Heo
99
0
0
17 Apr 2025
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking
H. Tran
Tinh-Anh Nguyen-Nhu
Huu-Phong Phan-Nguyen
T. Nguyen
Nhat-Minh Nguyen-Dich
Anh Dao
Huy-Duc Do
Quan Nguyen
Hoang M. Le
Quang-Vinh Dinh
73
0
0
11 Apr 2025
VideoExpert: Augmented LLM for Temporal-Sensitive Video Understanding
Henghao Zhao
Ge-Peng Ji
Rui Yan
Huan Xiong
Zechao Li
76
1
0
10 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Lu Dong
Xiangyu Zeng
Yinan He
Yun Wang
Yu Qiao
Yi Wang
Limin Wang
VLM
AI4TS
LRM
127
38
0
09 Apr 2025
LVC: A Lightweight Compression Framework for Enhancing VLMs in Long Video Understanding
Ziyi Wang
Haoran Wu
Yiming Rong
Deyang Jiang
Yixin Zhang
Yue Zhao
Shuang Xu
Bo Xu
VLM
99
0
0
09 Apr 2025
SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation
Hao Du
Bo Wu
Yan Lu
Zhendong Mao
83
0
0
08 Apr 2025
Moment Quantization for Video Temporal Grounding
Xiaolong Sun
Le Wang
Sanping Zhou
Liushuai Shi
Kun Xia
Mengnan Liu
Yabing Wang
Gang Hua
MQ
71
0
0
03 Apr 2025
TimeSearch: Hierarchical Video Search with Spotlight and Reflection for Human-like Long Video Understanding
Junwen Pan
Rui Zhang
Xin Wan
Yuan Zhang
Ming Lu
Qi She
VLM
99
1
0
02 Apr 2025
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko
S. Kim
Yumin Suh
Vijay Kumar B.G
Minseo Yoon
Manmohan Chandraker
Hyunwoo J. Kim
LRM
100
0
0
25 Mar 2025
Video-XL-Pro: Reconstructive Token Compression for Extremely Long Video Understanding
Xiangrui Liu
Yan Shu
Zhengyang Liang
Ao Li
Yang Tian
Bo Zhao
VGen
VLM
266
9
0
24 Mar 2025
Collaborative Temporal Consistency Learning for Point-supervised Natural Language Video Localization
Zhuo Tao
Liang Li
Qi Chen
Yunbin Tu
Zheng-Jun Zha
Ming-Hsuan Yang
Yuankai Qi
Qingming Huang
77
0
0
22 Mar 2025
Temporal Action Detection Model Compression by Progressive Block Drop
Xiaoyong Chen
Yong Guo
Jiaming Liang
Sitong Zhuang
Runhao Zeng
Xiping Hu
90
0
0
21 Mar 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Yang Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
389
6
0
17 Mar 2025
V-STaR: Benchmarking Video-LLMs on Video Spatio-Temporal Reasoning
Zixu Cheng
Jian Hu
Ziquan Liu
Chenyang Si
Wei Li
Shaogang Gong
LRM
145
5
0
14 Mar 2025
OmniSTVG: Toward Spatio-Temporal Omni-Object Video Grounding
Jiali Yao
Xinran Deng
Xin Gu
Mengrui Dai
Bing Fan
Zhipeng Zhang
Yan Huang
Heng Fan
L. Zhang
146
0
0
13 Mar 2025
Reasoning is All You Need for Video Generalization: A Counterfactual Benchmark with Sub-question Evaluation
Qiji Zhou
Yifan Gong
Guangsheng Bao
Hongjie Qiu
Jinqiang Li
Xiangrong Zhu
Huajian Zhang
Yue Zhang
LRM
83
0
0
12 Mar 2025
Generative Frame Sampler for Long Video Understanding
Linli Yao
Haoning Wu
Kun Ouyang
Yize Zhang
Caiming Xiong
Bei Chen
Xu Sun
Junnan Li
VLM
VGen
94
1
0
12 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
128
0
0
12 Mar 2025
Deep Understanding of Sign Language for Sign to Subtitle Alignment
Youngjoon Jang
Jeongsoo Choi
Junseok Ahn
Joon Son Chung
SLR
150
0
0
05 Mar 2025
LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection
Pengcheng Zhao
Zhixian He
Fuwei Zhang
Shujin Lin
Fan Zhou
136
2
0
18 Jan 2025
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection
Yifang Xu
Yunzhuo Sun
Benxiang Zhai
Zien Xie
Youyao Jia
S. Du
133
3
0
18 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
174
3
0
10 Jan 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
222
40
0
31 Dec 2024
Length-Aware DETR for Robust Moment Retrieval
Sangkwon Park
Jiho Choi
Kyungjune Baek
Hyunjung Shim
81
0
0
31 Dec 2024
FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding
Zhuo Cao
Bingqing Zhang
Heming Du
Xin Yu
Xue Li
Sen Wang
125
2
0
18 Dec 2024
NowYouSee Me: Context-Aware Automatic Audio Description
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
135
1
0
13 Dec 2024
TimeRefine: Temporal Grounding with Time Refining Video LLM
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
188
2
0
12 Dec 2024
Streaming Detection of Queried Event Start
Cristobal Eyzaguirre
Eric Tang
S. Buch
Adrien Gaidon
Jiajun Wu
Juan Carlos Niebles
116
0
0
04 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
178
1
0
04 Dec 2024
1
2
3
4
5
6
7
8
9
Next