ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2501.12386
  4. Cited By
InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling
v1v2v3 (latest)

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

21 January 2025
Yi Wang
Xinhao Li
Ziang Yan
Yinan He
Jiashuo Yu
Xiangyu Zeng
Chenting Wang
Changlian Ma
Haian Huang
Jianfei Gao
Min Dou
Kai Chen
Wenhai Wang
Yu Qiao
Yali Wang
Limin Wang
ArXiv (abs)PDFHTML

Papers citing "InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling"

40 / 40 papers shown
Title
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
APVR: Hour-Level Long Video Understanding with Adaptive Pivot Visual Information Retrieval
Hong-xia Gao
Yiming Bao
Xuezhan Tu
Bin Zhong
Minling Zhang
105
0
0
01 Jul 2025
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
video-SALMONN 2: Captioning-Enhanced Audio-Visual Large Language Models
Changli Tang
Yixuan Li
Yudong Yang
Jimin Zhuang
Guangzhi Sun
Wei Li
Zejun Ma
Chao Zhang
32
0
0
18 Jun 2025
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
Ego-R1: Chain-of-Tool-Thought for Ultra-Long Egocentric Video Reasoning
Shulin Tian
Ruiqi Wang
Hongming Guo
Penghao Wu
Yuhao Dong
Xiuying Wang
Jingkang Yang
Hao Zhang
Hongyuan Zhu
Ziwei Liu
RALMLRM
40
0
0
16 Jun 2025
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
VRBench: A Benchmark for Multi-Step Reasoning in Long Narrative Videos
Jiashuo Yu
Y. Wu
Meng Chu
Zhifei Ren
Z. Huang
...
Conghui He
Yu Qiao
Yali Wang
Yi Wang
L. Wang
LRM
138
0
0
12 Jun 2025
AV-Reasoner: Improving and Benchmarking Clue-Grounded Audio-Visual Counting for MLLMs
Lidong Lu
Guo Chen
Z. Li
Yicheng Liu
Tong Lu
VLMLRM
109
0
0
05 Jun 2025
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
VideoMathQA: Benchmarking Mathematical Reasoning via Multimodal Understanding in Videos
H. Rasheed
Abdelrahman M. Shaker
Anqi Tang
Muhammad Maaz
Ming-Hsuan Yang
Salman Khan
Fahad A Khan
AIMat
123
0
0
05 Jun 2025
MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection
MemoryOut: Learning Principal Features via Multimodal Sparse Filtering Network for Semi-supervised Video Anomaly Detection
Juntong Li
Lingwei Dang
Yukun Su
Yun Hao
Qingxin Xiao
Yongwei Nie
Qingyao Wu
84
0
0
03 Jun 2025
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Go Beyond Earth: Understanding Human Actions and Scenes in Microgravity Environments
Di Wen
Lei Qi
Kunyu Peng
Kailun Yang
Fei Teng
...
Yufan Chen
R. Liu
Yitian Shi
M. Sarfraz
Rainer Stiefelhagen
76
0
0
03 Jun 2025
DisTime: Distribution-based Time Representation for Video Large Language Models
DisTime: Distribution-based Time Representation for Video Large Language Models
Yingsen Zeng
Zepeng Huang
Yujie Zhong
Chengjian Feng
Jie Hu
Lin Ma
Yang Liu
VGen
32
0
0
30 May 2025
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Time Blindness: Why Video-Language Models Can't See What Humans Can?
Ujjwal Upadhyay
Mukul Ranjan
Zhiqiang Shen
Mohamed Elhoseiny
VLM
34
0
0
30 May 2025
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Vad-R1: Towards Video Anomaly Reasoning via Perception-to-Cognition Chain-of-Thought
Chao Huang
Benfeng Wang
Jie Wen
Chengliang Liu
Wei Wang
Li Shen
Xiaochun Cao
LRM
79
0
0
26 May 2025
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Deep Video Discovery: Agentic Search with Tool Use for Long-form Video Understanding
Xiaoyi Zhang
Zhaoyang Jia
Zongyu Guo
Jiahao Li
Bin Li
Houqiang Li
Yan Lu
215
0
0
23 May 2025
Temporal Consistency Constrained Transferable Adversarial Attacks with Background Mixup for Action Recognition
Ping Li
Jianan Ni
Bo Pang
AAML
255
0
0
23 May 2025
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
QuickVideo: Real-Time Long Video Understanding with System Algorithm Co-Design
Benjamin Schneider
Dongfu Jiang
Chao Du
Tianyu Pang
Wenhu Chen
VLM
79
0
0
22 May 2025
From Evaluation to Defense: Advancing Safety in Video Large Language Models
From Evaluation to Defense: Advancing Safety in Video Large Language Models
Yiwei Sun
Peiqi Jiang
Chuanbin Liu
Luohao Lin
Zhiying Lu
Hongtao Xie
60
0
0
22 May 2025
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
LLaDA-V: Large Language Diffusion Models with Visual Instruction Tuning
Zebin You
Shen Nie
Xiaolu Zhang
Jun Hu
Jun Zhou
Zhiwu Lu
J. Wen
Chongxuan Li
MLLMVLM
114
2
0
22 May 2025
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
VideoEval-Pro: Robust and Realistic Long Video Understanding Evaluation
Wentao Ma
Weiming Ren
Yiming Jia
Zhuofeng Li
Ping Nie
Ge Zhang
Wenhu Chen
82
1
0
20 May 2025
Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
Video Compression Commander: Plug-and-Play Inference Acceleration for Video Large Language Models
Xuyang Liu
Yiyu Wang
Junpeng Ma
Linfeng Zhang
VLM
55
0
0
20 May 2025
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
BusterX: MLLM-Powered AI-Generated Video Forgery Detection and Explanation
Haiquan Wen
Yiwei He
Zhenglin Huang
Tianxiao Li
Zihan Yu
Xingru Huang
Lu Qi
Baoyuan Wu
Xuelong Li
Guangliang Cheng
VGen
114
0
0
19 May 2025
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Video-SafetyBench: A Benchmark for Safety Evaluation of Video LVLMs
Xuannan Liu
Zekun Li
Zheqi He
Peipei Li
Shuhan Xia
Xing Cui
Huaibo Huang
Xi Yang
Ran He
EGVMAAML
101
1
0
17 May 2025
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
LOVE: Benchmarking and Evaluating Text-to-Video Generation and Video-to-Text Interpretation
Jiarui Wang
Huiyu Duan
Ziheng Jia
Yu Zhao
Woo Yi Yang
...
Zhongfu Chen
Juntong Wang
Yuke Xing
Guangtao Zhai
Xiongkuo Min
VGen
84
1
0
17 May 2025
VideoVista-CulturalLingo: 360$^\circ$ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
VideoVista-CulturalLingo: 360∘^\circ∘ Horizons-Bridging Cultures, Languages, and Domains in Video Comprehension
Xinyu Chen
Yunxin Li
Haoyuan Shi
Baotian Hu
Wenhan Luo
Yaowei Wang
Hao Fei
ELM
125
0
0
23 Apr 2025
MR. Video: "MapReduce" is the Principle for Long Video Understanding
MR. Video: "MapReduce" is the Principle for Long Video Understanding
Ziqi Pang
Yu-Xiong Wang
VLM
114
1
0
22 Apr 2025
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
An LMM for Efficient Video Understanding via Reinforced Compression of Video Cubes
Ji Qi
Yuan Yao
Yushi Bai
Bin Xu
Juanzi Li
Zhiyuan Liu
Tat-Seng Chua
85
0
0
21 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
127
6
0
20 Apr 2025
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
How Well Can General Vision-Language Models Learn Medicine By Watching Public Educational Videos?
Rahul Thapa
Andrew Li
Qingyang Wu
Bryan He
Yuki Sahashi
...
Angela Zhang
Ben Athiwaratkun
Shuaiwen Leon Song
David Ouyang
James Zou
LM&MA
184
0
0
19 Apr 2025
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Mavors: Multi-granularity Video Representation for Multimodal Large Language Model
Yang Shi
Jiaheng Liu
Yushuo Guan
Zhikai Wu
Yize Zhang
...
Bohan Zeng
Wei Zhang
Fuzheng Zhang
Wenjing Yang
Di Zhang
VGenVLM
140
2
0
14 Apr 2025
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Multimodal Long Video Modeling Based on Temporal Dynamic Context
Haoran Hao
Jiaming Han
Yiyuan Zhang
Xiangyu Yue
147
0
0
14 Apr 2025
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
VCR-Bench: A Comprehensive Evaluation Framework for Video Chain-of-Thought Reasoning
Yukun Qi
Yiming Zhao
Y. Zeng
Xikun Bao
Wenjie Huang
Lin Yen-Chen
Zehui Chen
Jie Zhao
Zhongang Qi
Feng Zhao
LRM
120
4
0
10 Apr 2025
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
VideoChat-R1: Enhancing Spatio-Temporal Perception via Reinforcement Fine-Tuning
Xinhao Li
Ziang Yan
Desen Meng
Lu Dong
Xiangyu Zeng
Yinan He
Yun Wang
Yu Qiao
Yi Wang
Limin Wang
VLMAI4TSLRM
133
38
0
09 Apr 2025
WikiVideo: Article Generation from Multiple Videos
WikiVideo: Article Generation from Multiple Videos
Alexander Martin
Reno Kriz
William Walden
Kate Sanders
Hannah Recknor
Eugene Yang
Francis Ferraro
Benjamin Van Durme
DiffMVGen
154
2
0
01 Apr 2025
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
ST-VLM: Kinematic Instruction Tuning for Spatio-Temporal Reasoning in Vision-Language Models
Dohwan Ko
S. Kim
Yumin Suh
Vijay Kumar B.G
Minseo Yoon
Manmohan Chandraker
Hyunwoo J. Kim
LRM
106
0
0
25 Mar 2025
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Does Your Vision-Language Model Get Lost in the Long Video Sampling Dilemma?
Tianyuan Qu
Longxiang Tang
Bohao Peng
Senqiao Yang
Bei Yu
Jiaya Jia
VLM
470
2
0
16 Mar 2025
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Vamba: Understanding Hour-Long Videos with Hybrid Mamba-Transformers
Weiming Ren
Wentao Ma
Huan Yang
Cong Wei
Ge Zhang
Wenhu Chen
Mamba
94
5
0
14 Mar 2025
Generative Frame Sampler for Long Video Understanding
Linli Yao
Haoning Wu
Kun Ouyang
Yize Zhang
Caiming Xiong
Bei Chen
Xu Sun
Junnan Li
VLMVGen
99
1
0
12 Mar 2025
Cockatiel: Ensembling Synthetic and Human Preferenced Training for Detailed Video Caption
Luozheng Qin
Zhiyu Tan
Mengping Yang
Xiaomeng Yang
Hao Li
178
0
0
12 Mar 2025
Nexar Dashcam Collision Prediction Dataset and Challenge
Daniel C. Moura
Shizhan Zhu
Orly Zvitia
102
1
0
05 Mar 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLMVLMLRM
370
59
0
03 Jan 2025
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling
Xinhao Li
Yi Wang
Jiashuo Yu
Xiangyu Zeng
Yuhan Zhu
...
Yinan He
Chenting Wang
Yu Qiao
Yali Wang
L. Wang
VLM
248
40
0
31 Dec 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Contextual AD Narration with Interleaved Multimodal Sequence
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
136
4
0
19 Mar 2024
1