ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.09601
  4. Cited By
TimeRefine: Temporal Grounding with Time Refining Video LLM
v1v2 (latest)

TimeRefine: Temporal Grounding with Time Refining Video LLM

12 December 2024
Xizi Wang
Feng Cheng
Ziyang Wang
Huiyu Wang
Md. Mohaiminul Islam
Lorenzo Torresani
Joey Tianyi Zhou
Gedas Bertasius
David J. Crandall
ArXiv (abs)PDFHTML

Papers citing "TimeRefine: Temporal Grounding with Time Refining Video LLM"

21 / 71 papers shown
Title
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form
  Sentences
Where Does It Exist: Spatio-Temporal Video Grounding for Multi-Form Sentences
Zhu Zhang
Zhou Zhao
Yang Zhao
Qi. Wang
Huasheng Liu
Lianli Gao
99
118
0
19 Jan 2020
EfficientDet: Scalable and Efficient Object Detection
EfficientDet: Scalable and Efficient Object Detection
Mingxing Tan
Ruoming Pang
Quoc V. Le
140
5,118
0
20 Nov 2019
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding
  in Videos
Semantic Conditioned Dynamic Modulation for Temporal Sentence Grounding in Videos
Yitian Yuan
Lin Ma
Jingwen Wang
Wei Liu
Wenwu Zhu
105
244
0
31 Oct 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
662
20,418
0
23 Oct 2019
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via
  Question Answering
ActivityNet-QA: A Dataset for Understanding Complex Web Videos via Question Answering
Zhou Yu
D. Xu
Jun-chen Yu
Ting Yu
Zhou Zhao
Yueting Zhuang
Dacheng Tao
146
478
0
06 Jun 2019
Hierarchical Recurrent Neural Network for Video Summarization
Hierarchical Recurrent Neural Network for Video Summarization
Bin Zhao
Xuelong Li
Xiaoqiang Lu
79
178
0
28 Apr 2019
TVQA+: Spatio-Temporal Grounding for Video Question Answering
TVQA+: Spatio-Temporal Grounding for Video Question Answering
Jie Lei
Licheng Yu
Tamara L. Berg
Joey Tianyi Zhou
75
230
0
25 Apr 2019
ExCL: Extractive Clip Localization Using Natural Language Descriptions
ExCL: Extractive Clip Localization Using Natural Language Descriptions
Soham Ghosh
Anuva Agarwal
Zarana Parekh
Alexander G. Hauptmann
CLIP
61
153
0
04 Apr 2019
VideoBERT: A Joint Model for Video and Language Representation Learning
VideoBERT: A Joint Model for Video and Language Representation Learning
Chen Sun
Austin Myers
Carl Vondrick
Kevin Patrick Murphy
Cordelia Schmid
VLMSSL
92
1,252
0
03 Apr 2019
Generalized Intersection over Union: A Metric and A Loss for Bounding
  Box Regression
Generalized Intersection over Union: A Metric and A Loss for Bounding Box Regression
S. Hamid Rezatofighi
Deyuan Li
JunYoung Gwak
Amir Sadeghian
Ian Reid
Silvio Savarese
164
4,205
0
25 Feb 2019
MAN: Moment Alignment Network for Natural Language Moment Retrieval via
  Iterative Graph Adjustment
MAN: Moment Alignment Network for Natural Language Moment Retrieval via Iterative Graph Adjustment
Da Zhang
Xiyang Dai
Xin Eric Wang
Yuan-fang Wang
L. Davis
93
305
0
30 Nov 2018
To Find Where You Talk: Temporal Sentence Localization in Video with
  Attention Based Location Regression
To Find Where You Talk: Temporal Sentence Localization in Video with Attention Based Location Regression
Yitian Yuan
Tao Mei
Wenwu Zhu
95
333
0
19 Apr 2018
YOLOv3: An Incremental Improvement
YOLOv3: An Incremental Improvement
Joseph Redmon
Ali Farhadi
ObjD
172
21,573
0
08 Apr 2018
TALL: Temporal Activity Localization via Language Query
TALL: Temporal Activity Localization via Language Query
J. Gao
Chen Sun
Zhenheng Yang
Ram Nevatia
168
828
0
05 May 2017
Dense-Captioning Events in Videos
Dense-Captioning Events in Videos
Ranjay Krishna
Kenji Hata
F. Ren
Li Fei-Fei
Juan Carlos Niebles
197
1,257
0
02 May 2017
Towards Automatic Learning of Procedures from Web Instructional Videos
Towards Automatic Learning of Procedures from Web Instructional Videos
Luowei Zhou
Chenliang Xu
Jason J. Corso
EgoV
92
836
0
28 Mar 2017
Mask R-CNN
Mask R-CNN
Kaiming He
Georgia Gkioxari
Piotr Dollár
Ross B. Girshick
ObjD
442
27,338
0
20 Mar 2017
Human Pose Estimation with Iterative Error Feedback
Human Pose Estimation with Iterative Error Feedback
João Carreira
Pulkit Agrawal
Katerina Fragkiadaki
Jitendra Malik
3DH
134
755
0
23 Jul 2015
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal
  Networks
Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks
Shaoqing Ren
Kaiming He
Ross B. Girshick
Jian Sun
AIMatObjD
573
62,668
0
04 Jun 2015
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
318
4,535
0
20 Nov 2014
Rich feature hierarchies for accurate object detection and semantic
  segmentation
Rich feature hierarchies for accurate object detection and semantic segmentation
Ross B. Girshick
Jeff Donahue
Trevor Darrell
Jitendra Malik
ObjD
333
26,271
0
11 Nov 2013
Previous
12