Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.09609
Cited By
QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries
20 July 2021
Jie Lei
Tamara L. Berg
Mohit Bansal
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"QVHighlights: Detecting Moments and Highlights in Videos via Natural Language Queries"
42 / 42 papers shown
Title
TEMPURA: Temporal Event Masked Prediction and Understanding for Reasoning in Action
Jen-Hao Cheng
Vivian Wang
Huayu Wang
Huapeng Zhou
Yi-Hao Peng
...
Wenhao Chai
Yi-Ling Chen
Vibhav Vineet
Qin Cai
Jenq-Neng Hwang
AI4TS
139
0
0
02 May 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao
Y. Li
Y. X. Wei
Lei Li
Shuhuai Ren
...
Sida Li
Lingpeng Kong
Qi Liu
Y. Zhang
Xu Sun
36
1
0
24 Apr 2025
VideoMind: A Chain-of-LoRA Agent for Long Video Reasoning
Y. Liu
Kevin Qinghong Lin
C. Chen
Mike Zheng Shou
LM&Ro
LRM
81
0
0
17 Mar 2025
CFSum: A Transformer-Based Multi-Modal Video Summarization Framework With Coarse-Fine Fusion
Yaowei Guo
Jiazheng Xing
Xiaojun Hou
Shuo Xin
Juntao Jiang
Demetri Terzopoulos
Chenfanfu Jiang
Yong Liu
ViT
36
0
0
01 Mar 2025
Vid-Morp: Video Moment Retrieval Pretraining from Unlabeled Videos in the Wild
Peijun Bao
Chenqi Kong
Zihao Shao
Boon Poh Ng
Meng Hwa Er
Alex C. Kot
54
2
0
01 Dec 2024
VideoLLM Knows When to Speak: Enhancing Time-Sensitive Video Comprehension with Video-Text Duet Interaction Format
Yueqian Wang
Xiaojun Meng
Y. Wang
Jianxin Liang
Jiansheng Wei
Huishuai Zhang
Dongyan Zhao
VGen
83
8
0
27 Nov 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
39
14
0
08 Oct 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
33
16
0
26 Sep 2024
Causal Understanding For Video Question Answering
Bhanu Prakash Reddy Guda
Tanmay Kulkarni
Adithya Sampath
Swarnashree Mysore Sathyendra
CML
48
0
0
23 Jul 2024
Prior Knowledge Integration via LLM Encoding and Pseudo Event Regulation for Video Moment Retrieval
Yiyang Jiang
Wengyu Zhang
Xu-Lu Zhang
Xiaoyong Wei
Chang Wen Chen
Qing Li
46
4
0
21 Jul 2024
Multimodal Language Models for Domain-Specific Procedural Video Summarization
Nafisa Hussain
29
0
0
07 Jul 2024
R
2
R^2
R
2
-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding
Ye Liu
Jixuan He
Wanhua Li
Junsik Kim
D. Wei
Hanspeter Pfister
Chang Wen Chen
36
13
0
31 Mar 2024
InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding
Yi Wang
Kunchang Li
Xinhao Li
Jiashuo Yu
Yinan He
...
Hongjie Zhang
Yifei Huang
Yu Qiao
Yali Wang
Limin Wang
34
44
0
22 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
32
30
0
15 Mar 2024
Towards Balanced Alignment: Modal-Enhanced Semantic Modeling for Video Moment Retrieval
Zhihang Liu
Jun Li
Hongtao Xie
Pandeng Li
Jiannan Ge
Sun-Ao Liu
Guoqing Jin
35
18
0
19 Dec 2023
TimeChat: A Time-sensitive Multimodal Large Language Model for Long Video Understanding
Shuhuai Ren
Linli Yao
Shicheng Li
Xu Sun
Lu Hou
VLM
MLLM
23
174
0
04 Dec 2023
Exploring Iterative Refinement with Diffusion Models for Video Grounding
Xiao Liang
Tao Shi
Yaoyuan Liang
Te Tao
Shao-Lun Huang
DiffM
27
1
0
26 Oct 2023
Language-Conditioned Change-point Detection to Identify Sub-Tasks in Robotics Domains
Divyanshu Raj
Chitta Baral
N. Gopalan
77
1
0
01 Sep 2023
Knowing Where to Focus: Event-aware Transformer for Video Grounding
Jinhyun Jang
Jungin Park
Jin-Hwa Kim
Hyeongjun Kwon
K. Sohn
16
49
0
14 Aug 2023
MomentDiff: Generative Video Moment Retrieval from Random to Real
P. Li
Chen-Wei Xie
Hongtao Xie
Liming Zhao
Lei Zhang
Yun Zheng
Deli Zhao
Yongdong Zhang
DiffM
VGen
34
56
0
06 Jul 2023
Joint Moment Retrieval and Highlight Detection Via Natural Language Queries
Richard Luo
Austin Peng
Heidi Yap
Koby Beard
ViT
18
0
0
08 May 2023
Hierarchical Video-Moment Retrieval and Step-Captioning
Abhaysinh Zala
Jaemin Cho
Satwik Kottur
Xilun Chen
Barlas Ouguz
Yasher Mehdad
Mohit Bansal
3DV
20
51
0
29 Mar 2023
Collaborative Noisy Label Cleaner: Learning Scene-aware Trailers for Multi-modal Highlight Detection in Movies
Bei Gan
Xiujun Shu
Ruizhi Qiao
Haoqian Wu
Keyun Chen
Hanjun Li
Bohan Ren
26
5
0
26 Mar 2023
Towards Diverse Temporal Grounding under Single Positive Labels
Hao Zhou
Chongyang Zhang
Yanjun Chen
Chuanping Hu
24
1
0
12 Mar 2023
Learning Grounded Vision-Language Representation for Versatile Understanding in Untrimmed Videos
Teng Wang
Jinrui Zhang
Feng Zheng
Wenhao Jiang
Ran Cheng
Ping Luo
VLM
31
11
0
11 Mar 2023
MINOTAUR: Multi-task Video Grounding From Multimodal Queries
Raghav Goyal
E. Mavroudi
Xitong Yang
Sainbayar Sukhbaatar
Leonid Sigal
Matt Feiszli
Lorenzo Torresani
Du Tran
14
7
0
16 Feb 2023
Perceiver-VL: Efficient Vision-and-Language Modeling with Iterative Latent Attention
Zineng Tang
Jaemin Cho
Jie Lei
Mohit Bansal
VLM
24
9
0
21 Nov 2022
Zero-shot Video Moment Retrieval With Off-the-Shelf Models
Anuj Diwan
Puyuan Peng
Raymond J. Mooney
VLM
26
3
0
03 Nov 2022
Weakly-Supervised Temporal Article Grounding
Long Chen
Yulei Niu
Brian Chen
Xudong Lin
G. Han
Christopher Thomas
Hammad A. Ayyubi
Heng Ji
Shih-Fu Chang
AI4TS
27
13
0
22 Oct 2022
Video Summarization Overview
Mayu Otani
Yale Song
Yang Wang
AI4TS
VLM
24
10
0
21 Oct 2022
An Empirical Study of End-to-End Video-Language Transformers with Masked Visual Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
William Yang Wang
Lijuan Wang
Zicheng Liu
VLM
19
63
0
04 Sep 2022
Multimodal Frame-Scoring Transformer for Video Summarization
Jeiyoon Park
Kiho Kwoun
Chanhee Lee
Heuiseok Lim
ViT
30
6
0
05 Jul 2022
ReLER@ZJU-Alibaba Submission to the Ego4D Natural Language Queries Challenge 2022
Na Liu
Xiaohan Wang
Xiaobo Li
Yi Yang
Yueting Zhuang
20
18
0
01 Jul 2022
ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Yan-Bo Lin
Jie Lei
Mohit Bansal
Gedas Bertasius
35
39
0
06 Apr 2022
TubeDETR: Spatio-Temporal Video Grounding with Transformers
Antoine Yang
Antoine Miech
Josef Sivic
Ivan Laptev
Cordelia Schmid
ViT
28
94
0
30 Mar 2022
UMT: Unified Multi-modal Transformers for Joint Video Moment Retrieval and Highlight Detection
Ye Liu
Siyuan Li
Yang Wu
C. Chen
Ying Shan
Xiaohu Qie
ViT
8
139
0
23 Mar 2022
Towards Visual-Prompt Temporal Answering Grounding in Medical Instructional Video
Bin Li
Yixuan Weng
Bin Sun
Shutao Li
27
24
0
13 Mar 2022
Temporal Sentence Grounding in Videos: A Survey and Future Directions
Hao Zhang
Aixin Sun
Wei Jing
Joey Tianyi Zhou
3DGS
36
38
0
20 Jan 2022
AssistSR: Task-oriented Video Segment Retrieval for Personal AI Assistant
Stan Weixian Lei
Difei Gao
Yuxuan Wang
Dongxing Mao
Zihan Liang
L. Ran
Mike Zheng Shou
19
8
0
30 Nov 2021
VIOLET : End-to-End Video-Language Transformers with Masked Visual-token Modeling
Tsu-jui Fu
Linjie Li
Zhe Gan
Kevin Qinghong Lin
W. Wang
Lijuan Wang
Zicheng Liu
VLM
39
216
0
24 Nov 2021
TVR: A Large-Scale Dataset for Video-Subtitle Moment Retrieval
Jie Lei
Licheng Yu
Tamara L. Berg
Mohit Bansal
116
275
0
24 Jan 2020
Improving neural networks by preventing co-adaptation of feature detectors
Geoffrey E. Hinton
Nitish Srivastava
A. Krizhevsky
Ilya Sutskever
Ruslan Salakhutdinov
VLM
266
7,634
0
03 Jul 2012
1