ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05424
  4. Cited By
Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and
  Language Models

Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models

8 June 2023
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad Shahbaz Khan
    MLLM
ArXivPDFHTML

Papers citing "Video-ChatGPT: Towards Detailed Video Understanding via Large Vision and Language Models"

50 / 462 papers shown
Title
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
SPORTU: A Comprehensive Sports Understanding Benchmark for Multimodal Large Language Models
H. Xia
Zhengbang Yang
Junbo Zou
Rhys Tracy
Yuqing Wang
...
Xun Shao
Zhuoqing Xie
Yuan-fang Wang
Weining Shen
Hanjie Chen
ReLM
LRM
ELM
50
2
0
11 Oct 2024
G$^{2}$TR: Generalized Grounded Temporal Reasoning for Robot Instruction
  Following by Combining Large Pre-trained Models
G2^{2}2TR: Generalized Grounded Temporal Reasoning for Robot Instruction Following by Combining Large Pre-trained Models
Riya Arora
N. N.
Aman Tambi
Sandeep S. Zachariah
Souvik Chakraborty
Rohan Paul
LM&Ro
31
0
0
10 Oct 2024
ElasticTok: Adaptive Tokenization for Image and Video
ElasticTok: Adaptive Tokenization for Image and Video
Wilson Yan
Matei A. Zaharia
Volodymyr Mnih
Pieter Abbeel
Aleksandra Faust
Hao Liu
VGen
54
6
0
10 Oct 2024
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
MM-Ego: Towards Building Egocentric Multimodal LLMs for Video QA
Hanrong Ye
Haotian Zhang
Erik Daxberger
Lin Chen
Zongyu Lin
...
Haoxuan You
Dan Xu
Zhe Gan
Jiasen Lu
Yinfei Yang
EgoV
MLLM
88
12
0
09 Oct 2024
Temporal Reasoning Transfer from Text to Video
Temporal Reasoning Transfer from Text to Video
Lei Li
Yuanxin Liu
Linli Yao
Peiyuan Zhang
Chenxin An
Lean Wang
Xu Sun
Lingpeng Kong
Qi Liu
LRM
48
7
0
08 Oct 2024
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
TRACE: Temporal Grounding Video LLM via Causal Event Modeling
Yongxin Guo
Jingyu Liu
Mingda Li
Xiaoying Tang
Qingbin Liu
Xiaoying Tang
52
14
0
08 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A Survey
Xiaorui Sun
Jing Liu
H. Shen
Xiaofeng Zhu
Ping Hu
VLM
56
4
0
07 Oct 2024
Realizing Video Summarization from the Path of Language-based Semantic
  Understanding
Realizing Video Summarization from the Path of Language-based Semantic Understanding
Kuan-Chen Mu
Zhi-Yi Chin
Wei-Chen Chiu
28
0
0
06 Oct 2024
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Geometric Analysis of Reasoning Trajectories: A Phase Space Approach to Understanding Valid and Invalid Multi-Hop Reasoning in LLMs
Javier Marin
LRM
88
0
0
06 Oct 2024
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video
  Large Language Models
Grounded-VideoLLM: Sharpening Fine-grained Temporal Grounding in Video Large Language Models
Haibo Wang
Zhiyang Xu
Yu Cheng
Shizhe Diao
Yufan Zhou
Yixin Cao
Qifan Wang
Weifeng Ge
Lifu Huang
24
21
0
04 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
89
26
0
04 Oct 2024
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Frame-Voyager: Learning to Query Frames for Video Large Language Models
Sicheng Yu
Chengkai Jin
Huanyu Wang
Zhenghao Chen
Sheng Jin
...
Zhenbang Sun
Bingni Zhang
Jiawei Wu
Hao Zhang
Qianru Sun
77
5
0
04 Oct 2024
Video Instruction Tuning With Synthetic Data
Video Instruction Tuning With Synthetic Data
Yuanhan Zhang
Jinming Wu
Wei Li
Bo Li
Zejun Ma
Ziwei Liu
Chunyuan Li
SyDa
VGen
55
144
0
03 Oct 2024
Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and
  Benchmark
Open-vocabulary Multimodal Emotion Recognition: Dataset, Metric, and Benchmark
Zheng Lian
Haiyang Sun
Guoying Zhao
Lan Chen
Haoyu Chen
...
Rui Liu
Shan Liang
Ya Li
Jiangyan Yi
Jianhua Tao
VLM
43
0
0
02 Oct 2024
UAL-Bench: The First Comprehensive Unusual Activity Localization
  Benchmark
UAL-Bench: The First Comprehensive Unusual Activity Localization Benchmark
Hasnat Md Abdullah
Tian Liu
Kangda Wei
Shu Kong
Ruihong Huang
44
3
0
02 Oct 2024
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large
  Language Models
ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models
Mengxue Qu
Xiaodong Chen
Wu Liu
Alicia Li
Yao Zhao
47
13
0
01 Oct 2024
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
MM1.5: Methods, Analysis & Insights from Multimodal LLM Fine-tuning
Haotian Zhang
Mingfei Gao
Zhe Gan
Philipp Dufter
Nina Wenzel
...
Haoxuan You
Zirui Wang
Afshin Dehghan
Peter Grasch
Yinfei Yang
VLM
MLLM
42
33
1
30 Sep 2024
Efficient Driving Behavior Narration and Reasoning on Edge Device Using
  Large Language Models
Efficient Driving Behavior Narration and Reasoning on Edge Device Using Large Language Models
Yizhou Huang
Yihua Cheng
Kezhi Wang
LRM
52
1
0
30 Sep 2024
One Token to Seg Them All: Language Instructed Reasoning Segmentation in
  Videos
One Token to Seg Them All: Language Instructed Reasoning Segmentation in Videos
Zechen Bai
Tong He
Haiyang Mei
Pichao Wang
Ziteng Gao
Joya Chen
Lei Liu
Zheng Zhang
Mike Zheng Shou
VLM
VOS
MLLM
50
17
0
29 Sep 2024
Video DataFlywheel: Resolving the Impossible Data Trinity in
  Video-Language Understanding
Video DataFlywheel: Resolving the Impossible Data Trinity in Video-Language Understanding
Xiao Wang
Jianlong Wu
Zijia Lin
Fuzheng Zhang
Di Zhang
Liqiang Nie
VGen
37
1
0
29 Sep 2024
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
E.T. Bench: Towards Open-Ended Event-Level Video-Language Understanding
Ye Liu
Zongyang Ma
Zhongang Qi
Yang Wu
Ying Shan
Chang Wen Chen
41
16
0
26 Sep 2024
LLM4Brain: Training a Large Language Model for Brain Video Understanding
LLM4Brain: Training a Large Language Model for Brain Video Understanding
Ruizhe Zheng
Lichao Sun
29
0
0
26 Sep 2024
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Zhiyu Tan
Hao Li
Jingjing Chen
MLLM
66
18
0
25 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video
  Understanding
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu
Peitian Zhang
Zheng Liu
Minghao Qin
Yueze Wang
Tiejun Huang
Bo Zhao
VLM
52
42
0
22 Sep 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free
  Manner
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
41
7
0
19 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
41
1
0
19 Sep 2024
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Oryx MLLM: On-Demand Spatial-Temporal Understanding at Arbitrary Resolution
Zuyan Liu
Yuhao Dong
Ziwei Liu
Winston Hu
Jiwen Lu
Yongming Rao
ObjD
88
55
0
19 Sep 2024
Generating Event-oriented Attribution for Movies via Two-Stage
  Prefix-Enhanced Multimodal LLM
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
28
0
0
14 Sep 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu
Pengxiang Ding
Siteng Huang
Min Zhang
Han Zhao
Donglin Wang
40
7
0
11 Sep 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with
  Multimodal Large Language Models
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
Y. Guo
Faizan Siddiqui
Yang Zhao
Rama Chellappa
Shao-Yuan Lo
LRM
59
2
0
31 Aug 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and
  Semantics
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
Winston H. Hsu
Shang-Hong Lai
31
3
0
30 Aug 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
59
12
0
29 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
50
89
0
29 Aug 2024
Training-free Video Temporal Grounding using Large-scale Pre-trained
  Models
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng
Xinhao Cai
Qingchao Chen
Yuxin Peng
Yang Liu
45
4
0
29 Aug 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
Qihang Ge
Wei Sun
Yu Zhang
Yunhao Li
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
56
4
0
26 Aug 2024
Generating Realistic X-ray Scattering Images Using Stable Diffusion and
  Human-in-the-loop Annotations
Generating Realistic X-ray Scattering Images Using Stable Diffusion and Human-in-the-loop Annotations
Zhuowen Zhao
Xiaoya Chong
Tanny Chavez
Alexander Hexemer
53
1
0
22 Aug 2024
Continuous Perception Benchmark
Continuous Perception Benchmark
Zeyu Wang
Zhenzhen Weng
Serena Yeung-Levy
VLM
39
0
0
15 Aug 2024
VITA: Towards Open-Source Interactive Omni Multimodal LLM
VITA: Towards Open-Source Interactive Omni Multimodal LLM
Chaoyou Fu
Haojia Lin
Zuwei Long
Yunhang Shen
Meng Zhao
...
Ran He
Rongrong Ji
Yunsheng Wu
Caifeng Shan
Xing Sun
MLLM
50
2
0
09 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
36
10
0
08 Aug 2024
LLaVA-OneVision: Easy Visual Task Transfer
LLaVA-OneVision: Easy Visual Task Transfer
Bo Li
Yuanhan Zhang
Dong Guo
Renrui Zhang
Feng Li
Hao Zhang
Kaichen Zhang
Yanwei Li
Ziwei Liu
Chunyuan Li
MLLM
SyDa
VLM
58
599
0
06 Aug 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
82
48
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
45
1
0
04 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
54
32
0
02 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
40
3
0
01 Aug 2024
Learning Video Context as Interleaved Multimodal Sequences
Learning Video Context as Interleaved Multimodal Sequences
S. Shao
Pengchuan Zhang
Y. Li
Xide Xia
A. Meso
Ziteng Gao
Jinheng Xie
N. Holliman
Mike Zheng Shou
58
5
0
31 Jul 2024
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language
  Models
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Junda Wu
Xintong Li
Tong Yu
Yu Wang
Xiang Chen
Jiuxiang Gu
Lina Yao
Jingbo Shang
Julian McAuley
52
0
0
29 Jul 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
  Models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
69
49
0
22 Jul 2024
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language
  Understanding
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Haoning Wu
Dongxu Li
Bei Chen
Junnan Li
40
114
0
22 Jul 2024
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained
  Spatial-Temporal Understanding
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong
Yuki Kawana
Rajat Saini
Ashutosh Kumar
Jingjing Pan
...
Yohei Ozao
Balázs Opra
D. Anastasiu
Yoichi Sato
Norimasa Kobori
VGen
44
8
0
22 Jul 2024
Previous
123456...8910
Next