ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.11310
  4. Cited By
Towards Long-Form Video Understanding

Towards Long-Form Video Understanding

21 June 2021
Chaoxia Wu
Philipp Krahenbuhl
    VLM
    ViT
ArXivPDFHTML

Papers citing "Towards Long-Form Video Understanding"

50 / 121 papers shown
Title
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking
Towards Efficient and Robust Moment Retrieval System: A Unified Framework for Multi-Granularity Models and Temporal Reranking
H. Tran
Tinh-Anh Nguyen-Nhu
Huu-Phong Phan-Nguyen
T. Nguyen
Nhat-Minh Nguyen-Dich
Anh Dao
Huy-Duc Do
Quan Nguyen
Hoang M. Le
Quang-Vinh Dinh
29
0
0
11 Apr 2025
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
REEF: Relevance-Aware and Efficient LLM Adapter for Video Understanding
Sakib Reza
Xiyun Song
Heather Yu
Zongfang Lin
Mohsen Moghaddam
Octavia Camps
29
0
0
07 Apr 2025
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Safety Modulation: Enhancing Safety in Reinforcement Learning through Cost-Modulated Rewards
Hanping Zhang
Yuhong Guo
OffRL
38
0
0
03 Apr 2025
Towards Online Multi-Modal Social Interaction Understanding
Towards Online Multi-Modal Social Interaction Understanding
X. Li
Shijian Deng
Bolin Lai
Weiguo Pian
James M. Rehg
Yapeng Tian
46
0
0
25 Mar 2025
Agentic Keyframe Search for Video Question Answering
Agentic Keyframe Search for Video Question Answering
Sunqi Fan
Meng-Hao Guo
Shuojin Yang
45
0
0
20 Mar 2025
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Long-VMNet: Accelerating Long-Form Video Understanding via Fixed Memory
Saket Gurukar
Asim Kadav
VLM
50
0
0
17 Mar 2025
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
HierarQ: Task-Aware Hierarchical Q-Former for Enhanced Video Understanding
Shehreen Azad
Vibhav Vineet
Y. S. Rawat
VLM
131
1
0
11 Mar 2025
EVE: Towards End-to-End Video Subtitle Extraction with Vision-Language Models
Haiyang Yu
Jinghui Lu
Yanjie Wang
Yang Li
H. Wang
Can Huang
B. Li
VLM
63
1
0
06 Mar 2025
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
OpenTAD: A Unified Framework and Comprehensive Study of Temporal Action Detection
Shuming Liu
Chen Zhao
Fatimah Zohra
Mattia Soldan
Alejandro Pardo
...
Juan Carlos León Alcázar
A. Cioppa
Silvio Giancola
Carlos Hinojosa
Bernard Ghanem
68
3
0
27 Feb 2025
MLVU: Benchmarking Multi-task Long Video Understanding
MLVU: Benchmarking Multi-task Long Video Understanding
Junjie Zhou
Yan Shu
Bo Zhao
Boya Wu
Zhengyang Liang
...
Xi Yang
Y. Xiong
Bo Zhang
Tiejun Huang
Zheng Liu
VLM
56
33
0
03 Jan 2025
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
A Simple Recipe for Contrastively Pre-training Video-First Encoders Beyond 16 Frames
Pinelopi Papalampidi
Skanda Koppula
Shreya Pathak
Justin T Chiu
Joseph Heyward
Viorica Patraucean
Jiajun Shen
Antoine Miech
Andrew Zisserman
Aida Nematzdeh
VLM
60
24
0
31 Dec 2024
NowYouSee Me: Context-Aware Automatic Audio Description
NowYouSee Me: Context-Aware Automatic Audio Description
Seon-Ho Lee
Jue Wang
D. Fan
Zhikang Zhang
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
86
0
0
13 Dec 2024
Neptune: The Long Orbit to Benchmarking Long Video Understanding
Arsha Nagrani
Mingda Zhang
Ramin Mehran
Rachel Hornung
N. B. Gundavarapu
...
Boqing Gong
Cordelia Schmid
Mikhail Sirotenko
Yukun Zhu
Tobias Weyand
103
4
0
12 Dec 2024
GEXIA: Granularity Expansion and Iterative Approximation for Scalable
  Multi-grained Video-language Learning
GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning
Y. Wang
Zhikang Zhang
Jue Wang
D. Fan
Zhenlin Xu
Linda Liu
Xiang Hao
Vimal Bhat
Xinyu Li
VLM
82
1
0
10 Dec 2024
Video LLMs for Temporal Reasoning in Long Videos
Video LLMs for Temporal Reasoning in Long Videos
Fawad Javed Fateh
Umer Ahmed
Hamza Khan
M. Zia
Quoc-Huy Tran
VLM
89
0
0
04 Dec 2024
Extending Video Masked Autoencoders to 128 frames
Extending Video Masked Autoencoders to 128 frames
N. B. Gundavarapu
Luke Friedman
Raghav Goyal
Chaitra Hegde
Eirikur Agustsson
...
Mikhail Sirotenko
Ming Yang
Tobias Weyand
Boqing Gong
Leonid Sigal
82
1
0
20 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
77
2
0
01 Nov 2024
Video Token Merging for Long-form Video Understanding
Video Token Merging for Long-form Video Understanding
Seon-Ho Lee
Jue Wang
Zhikang Zhang
D. Fan
Xinyu Li
40
5
0
31 Oct 2024
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Making Every Frame Matter: Continuous Activity Recognition in Streaming Video via Adaptive Video Context Modeling
Hao Wu
Donglin Bai
Shiqi Jiang
Qianxi Zhang
Y. Yang
Ting Cao
Fengyuan Xu
Yunxin Liu
Fengyuan Xu
151
0
0
19 Oct 2024
ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
ScreenWriter: Automatic Screenplay Generation and Movie Summarisation
Louis Mahon
Mirella Lapata
26
2
0
17 Oct 2024
Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for
  Action Detection
Cefdet: Cognitive Effectiveness Network Based on Fuzzy Inference for Action Detection
Zhe Luo
Weina Fu
Shuai Liu
Saeed Anwar
Muhammad Saqib
Sambit Bakshi
Khan Muhammad
36
2
0
08 Oct 2024
From Seconds to Hours: Reviewing MultiModal Large Language Models on
  Comprehensive Long Video Understanding
From Seconds to Hours: Reviewing MultiModal Large Language Models on Comprehensive Long Video Understanding
Heqing Zou
Tianze Luo
Guiyang Xie
Victor
Zhang
...
Guangcong Wang
Juanyang Chen
Zhuochen Wang
Hansheng Zhang
Huaijian Zhang
VLM
34
6
0
27 Sep 2024
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient
  Object-Aware Pretraining
SOAR: Self-supervision Optimized UAV Action Recognition with Efficient Object-Aware Pretraining
Ruiqi Xian
Xiyang Wu
Tianrui Guan
Xijun Wang
Boqing Gong
Dinesh Manocha
ViT
39
0
0
26 Sep 2024
Path-adaptive Spatio-Temporal State Space Model for Event-based
  Recognition with Arbitrary Duration
Path-adaptive Spatio-Temporal State Space Model for Event-based Recognition with Arbitrary Duration
Jiazhou Zhou
Kanghao Chen
Lei Zhang
Lin Wang
24
0
0
25 Sep 2024
HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video
  ANnotAtions
HAVANA: Hierarchical stochastic neighbor embedding for Accelerated Video ANnotAtions
Alexandru Bobe
J. C. V. Gemert
26
0
0
16 Sep 2024
Generating Event-oriented Attribution for Movies via Two-Stage
  Prefix-Enhanced Multimodal LLM
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Bill Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
23
0
0
14 Sep 2024
Towards Social AI: A Survey on Understanding Social Interactions
Towards Social AI: A Survey on Understanding Social Interactions
Sangmin Lee
Minzhi Li
Bolin Lai
Wenqi Jia
Fiona Ryan
...
Ozgur Kara
Bikram Boote
Weiyan Shi
Diyi Yang
James M. Rehg
39
4
0
05 Sep 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and
  Semantics
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
Winston H. Hsu
Shang-Hong Lai
26
3
0
30 Aug 2024
Continuous Perception Benchmark
Continuous Perception Benchmark
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
VLM
31
0
0
15 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
43
5
0
31 Jul 2024
VideoMamba: Spatio-Temporal Selective State Space Model
VideoMamba: Spatio-Temporal Selective State Space Model
Jinyoung Park
Hee-Seon Kim
Kangwook Ko
Minbeom Kim
Changick Kim
Mamba
39
7
0
11 Jul 2024
Multilingual Synopses of Movie Narratives: A Dataset for Story
  Understanding
Multilingual Synopses of Movie Narratives: A Dataset for Story Understanding
Yidan Sun
Jianfei Yu
Boyang Li
51
0
0
18 Jun 2024
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Vivid-ZOO: Multi-View Video Generation with Diffusion Model
Bing Li
Cheng Zheng
Wenxuan Zhu
Jinjie Mai
Biao Zhang
Peter Wonka
Bernard Ghanem
45
16
0
12 Jun 2024
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
MeMSVD: Long-Range Temporal Structure Capturing Using Incremental SVD
Ioanna Ntinou
Enrique Sanchez
Georgios Tzimiropoulos
42
0
0
11 Jun 2024
Encoding and Controlling Global Semantics for Long-form Video Question
  Answering
Encoding and Controlling Global Semantics for Long-form Video Question Answering
Thong Nguyen
Zhiyuan Hu
Xiaobao Wu
Cong-Duy Nguyen
See-Kiong Ng
A. Luu
43
2
0
30 May 2024
CinePile: A Long Video Question Answering Dataset and Benchmark
CinePile: A Long Video Question Answering Dataset and Benchmark
Ruchit Rawal
Khalid Saifullah
Ronen Basri
David Jacobs
Gowthami Somepalli
Tom Goldstein
40
39
0
14 May 2024
A Semantic and Motion-Aware Spatiotemporal Transformer Network for
  Action Detection
A Semantic and Motion-Aware Spatiotemporal Transformer Network for Action Detection
Matthew Korban
Peter Youngs
Scott T. Acton
ViT
29
6
0
13 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question
  Answering
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Jenq-Neng Hwang
Xi Li
Gaoang Wang
VLM
MLLM
37
28
0
26 Apr 2024
Mamba-360: Survey of State Space Models as Transformer Alternative for
  Long Sequence Modelling: Methods, Applications, and Challenges
Mamba-360: Survey of State Space Models as Transformer Alternative for Long Sequence Modelling: Methods, Applications, and Challenges
Badri N. Patro
Vijay Srinivas Agneeswaran
Mamba
46
38
0
24 Apr 2024
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video
  Understanding
MA-LMM: Memory-Augmented Large Multimodal Model for Long-Term Video Understanding
Bo He
Hengduo Li
Young Kyun Jang
Menglin Jia
Xuefei Cao
Ashish Shah
Abhinav Shrivastava
Ser-Nam Lim
MLLM
81
88
0
08 Apr 2024
Koala: Key frame-conditioned long video-LLM
Koala: Key frame-conditioned long video-LLM
Reuben Tan
Ximeng Sun
Ping Hu
Jui-hsien Wang
Hanieh Deilamsalehy
Bryan A. Plummer
Bryan C. Russell
Kate Saenko
38
35
0
05 Apr 2024
LongVLM: Efficient Long Video Understanding via Large Language Models
LongVLM: Efficient Long Video Understanding via Large Language Models
Yuetian Weng
Mingfei Han
Haoyu He
Xiaojun Chang
Bohan Zhuang
VLM
65
56
0
04 Apr 2024
Streaming Dense Video Captioning
Streaming Dense Video Captioning
Xingyi Zhou
Anurag Arnab
Shyamal Buch
Shen Yan
Austin Myers
Xuehan Xiong
Arsha Nagrani
Cordelia Schmid
VLM
33
31
0
01 Apr 2024
VideoDistill: Language-aware Vision Distillation for Video Question
  Answering
VideoDistill: Language-aware Vision Distillation for Video Question Answering
Bo Zou
Chao Yang
Yu Qiao
Chengbin Quan
Youjian Zhao
VGen
47
1
0
01 Apr 2024
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
VideoAgent: A Memory-augmented Multimodal Agent for Video Understanding
Yue Fan
Xiaojian Ma
Rujie Wu
Yuntao Du
Jiaqi Li
Zhi Gao
Qing Li
VLM
LLMAG
46
55
0
18 Mar 2024
Towards Neuro-Symbolic Video Understanding
Towards Neuro-Symbolic Video Understanding
Minkyu Choi
Harsh Goel
Mohammad Omama
Yunhao Yang
Sahil Shah
Sandeep P. Chinchali
32
8
0
16 Mar 2024
VideoMamba: State Space Model for Efficient Video Understanding
VideoMamba: State Space Model for Efficient Video Understanding
Kunchang Li
Xinhao Li
Yi Wang
Yinan He
Yali Wang
Limin Wang
Yu Qiao
Mamba
37
180
0
11 Mar 2024
Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Find the Cliffhanger: Multi-Modal Trailerness in Soap Operas
Carlo Bretti
Pascal Mettes
Hendrik Vincent Koops
Daan Odijk
N. V. Noord
29
4
0
29 Jan 2024
A Simple LLM Framework for Long-Range Video Question-Answering
A Simple LLM Framework for Long-Range Video Question-Answering
Ce Zhang
Taixi Lu
Md. Mohaiminul Islam
Ziyang Wang
Shoubin Yu
Mohit Bansal
Gedas Bertasius
102
80
0
28 Dec 2023
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie
  Understanding
MoVQA: A Benchmark of Versatile Question-Answering for Long-Form Movie Understanding
Hongjie Zhang
Yi Liu
Lu Dong
Yifei Huang
Z. Ling
Yali Wang
Limin Wang
Yu Qiao
23
25
0
08 Dec 2023
123
Next