Communities
Connect sessions
AI calendar
Organizations
Join Slack
Contact Sales
Search
Open menu
Home
Papers
2306.02858
Cited By
v1
v2
v3
v4 (latest)
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2023
5 June 2023
Hang Zhang
Xin Li
Lidong Bing
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
HuggingFace (19 upvotes)
Papers citing
"Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding"
50 / 669 papers shown
Title
Unlocking Continual Learning Abilities in Language Models
Wenyu Du
Shuang Cheng
Tongxu Luo
Zihan Qiu
Zeyu Huang
Ka Chun Cheung
Reynold Cheng
Jie Fu
KELM
CLL
237
17
0
25 Jun 2024
CART: A Generative Cross-Modal Retrieval Framework with Coarse-To-Fine Semantic Modeling
Minghui Fang
Shengpeng Ji
Jialong Zuo
Hai Huang
Yan Xia
...
Xiaoda Yang
Wenrui Liu
Gang Wang
Zhenhua Dong
Zhou Zhao
121
9
0
25 Jun 2024
EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Zhiyu Tan
Xiaomeng Yang
Luozheng Qin
Mengping Yang
Cheng Zhang
Hao Li
289
13
0
24 Jun 2024
video-SALMONN: Speech-Enhanced Audio-Visual Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Yuxuan Wang
Chao Zhang
169
59
0
22 Jun 2024
GUI Action Narrator: Where and When Did That Action Take Place?
Qinchen Wu
Difei Gao
Kevin Qinghong Lin
Zhuoyu Wu
Xiangwu Guo
Peiran Li
Weichen Zhang
Hengxu Wang
Mike Zheng Shou
179
5
0
19 Jun 2024
Through the Theory of Mind's Eye: Reading Minds with Multimodal Video Large Language Models
Z. Chen
Tianchun Wang
Yizhou Wang
Michal Kosinski
Xiang Zhang
Yun Fu
Sheng Li
LRM
202
6
0
19 Jun 2024
VoCo-LLaMA: Towards Vision Compression with Large Language Models
Xubing Ye
Yukang Gan
Xiaoke Huang
Yixiao Ge
Yansong Tang
MLLM
VLM
259
45
0
18 Jun 2024
VideoLLM-online: Online Video Large Language Model for Streaming Video
Joya Chen
Zhaoyang Lv
Shiwei Wu
Kevin Qinghong Lin
Chenan Song
Difei Gao
Jia-Wei Liu
Ziteng Gao
Dongxing Mao
Mike Zheng Shou
MLLM
MoMe
267
96
0
17 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Yuxuan Zhou
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
196
102
0
17 Jun 2024
VANE-Bench: Video Anomaly Evaluation Benchmark for Conversational LMMs
North American Chapter of the Association for Computational Linguistics (NAACL), 2024
Rohit K Bharadwaj
Hanan Gani
Muzammal Naseer
Fahad Shahbaz Khan
Salman Khan
267
16
0
14 Jun 2024
VideoGPT+: Integrating Image and Video Encoders for Enhanced Video Understanding
Muhammad Maaz
H. Rasheed
Salman Khan
Fahad A Khan
VLM
MLLM
219
97
0
13 Jun 2024
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan
Wei-Ping Huang
Hung-yi Lee
AuLLM
137
16
0
12 Jun 2024
LVBench: An Extreme Long Video Understanding Benchmark
Weihan Wang
Zehai He
Wenyi Hong
Yean Cheng
Xiaohan Zhang
...
Shiyu Huang
Bin Xu
Yuxiao Dong
Ming Ding
Jie Tang
ELM
VLM
390
184
0
12 Jun 2024
TRINS: Towards Multimodal Language Models that Can Read
Ruiyi Zhang
Yanzhe Zhang
Jian Chen
Jiuxiang Gu
Jiuxiang Gu
Changyou Chen
Tong Sun
VLM
193
7
0
10 Jun 2024
Chain-of-Scrutiny: Detecting Backdoor Attacks for Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Xi Li
Ruofan Mao
Yusen Zhang
Renze Lou
Chen Wu
Jiaqi Wang
LRM
AAML
301
20
0
10 Jun 2024
Video-Language Understanding: A Survey from Model Architecture, Model Training, and Data Perspectives
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Thong Nguyen
Yi Bin
Junbin Xiao
Leigang Qu
Yicong Li
Jay Zhangjie Wu
Cong-Duy Nguyen
See-Kiong Ng
Luu Anh Tuan
VLM
442
25
1
09 Jun 2024
VP-LLM: Text-Driven 3D Volume Completion with Large Language Models through Patchification
Jianmeng Liu
Yichen Liu
Yuyao Zhang
Zeyuan Meng
Yu-Wing Tai
Chi-Keung Tang
152
0
0
08 Jun 2024
ShareGPT4Video: Improving Video Understanding and Generation with Better Captions
Neural Information Processing Systems (NeurIPS), 2024
Lin Chen
Xilin Wei
Jinsong Li
Xiaoyi Dong
Pan Zhang
...
Li Yuan
Yu Qiao
Dahua Lin
Feng Zhao
Jiaqi Wang
275
305
0
06 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Xu Tan
VGen
520
27
0
06 Jun 2024
AD-H: Autonomous Driving with Hierarchical Agents
Zaibin Zhang
Shiyu Tang
Yuanhang Zhang
Talas Fu
Yifan Wang
Yang Liu
Dong Wang
Jing Shao
Lijun Wang
H. Lu
140
10
0
05 Jun 2024
Evaluation of data inconsistency for multi-modal sentiment analysis
Yufei Wang
Mengyue Wu
200
2
0
05 Jun 2024
Towards Practical Single-shot Motion Synthesis
Konstantinos Roditakis
Spyridon Thermos
N. Zioulis
VGen
283
1
0
03 Jun 2024
Video-MME: The First-Ever Comprehensive Evaluation Benchmark of Multi-modal LLMs in Video Analysis
Chaoyou Fu
Yuhan Dai
Yondong Luo
Lei Li
Shuhuai Ren
...
Xiawu Zheng
Enhong Chen
Caifeng Shan
Xing Sun
Xing Sun
VLM
MLLM
403
755
0
31 May 2024
Can't make an Omelette without Breaking some Eggs: Plausible Action Anticipation using Large Video-Language Models
Himangi Mittal
Nakul Agarwal
Shao-Yuan Lo
Kwonjoon Lee
238
28
0
30 May 2024
Temporal Grounding of Activities using Multimodal Large Language Models
Young Chol Song
210
1
0
30 May 2024
Matryoshka Query Transformer for Large Vision-Language Models
Wenbo Hu
Zi-Yi Dou
Liunian Harold Li
Amita Kamath
Nanyun Peng
Kai-Wei Chang
MLLM
222
23
0
29 May 2024
VideoTree: Adaptive Tree-based Video Representation for LLM Reasoning on Long Videos
Ziyang Wang
Shoubin Yu
Elias Stengel-Eskin
Jaehong Yoon
Feng Cheng
Gedas Bertasius
Mohit Bansal
356
135
0
29 May 2024
Cross-Modal Safety Alignment: Is textual unlearning all you need?
Trishna Chakraborty
Erfan Shayegani
Zikui Cai
Nael B. Abu-Ghazaleh
M. Salman Asif
Yue Dong
Amit K. Roy-Chowdhury
Chengyu Song
174
23
0
27 May 2024
A Survey of Multimodal Large Language Model from A Data-centric Perspective
Tianyi Bai
Hao Liang
Binwang Wan
Yanran Xu
Xi Li
...
Ping Huang
Jiulong Shan
Conghui He
Binhang Yuan
Wentao Zhang
283
59
0
26 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
696
145
0
23 May 2024
Imp: Highly Capable Large Multimodal Models for Mobile Devices
Zhenwei Shao
Zhou Yu
Jun Yu
Xuecheng Ouyang
Lihao Zheng
Zhenbiao Gai
Mingyang Wang
Jiajun Ding
145
22
0
20 May 2024
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
229
82
0
17 May 2024
Listen Again and Choose the Right Answer: A New Paradigm for Automatic Speech Recognition with Large Language Models
Annual Meeting of the Association for Computational Linguistics (ACL), 2024
Yuchen Hu
Chen Chen
Chengwei Qin
Qiushi Zhu
Eng Siong Chng
Ruizhe Li
AuLLM
KELM
203
14
0
16 May 2024
A Survey of Large Language Models for Graphs
Xubin Ren
Jiabin Tang
D. Yin
Nitesh Chawla
Chao Huang
211
82
0
10 May 2024
EALD-MLLM: Emotion Analysis in Long-sequential and De-identity videos with Multi-modal Large Language Model
Deng Li
Xin Liu
Bohao Xing
Baiqiang Xia
Yuan Zong
Bihan Wen
Heikki Kälviäinen
269
10
0
01 May 2024
MovieChat+: Question-aware Sparse Memory for Long Video Question Answering
Enxin Song
Wenhao Chai
Tianbo Ye
Lei Li
Xi Li
Gaoang Wang
VLM
MLLM
199
47
0
26 Apr 2024
MER 2024: Semi-Supervised Learning, Noise Robustness, and Open-Vocabulary Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Zhuofan Wen
Siyuan Zhang
...
Yinan Han
Xiaoshi Zhong
Guoying Zhao
Björn W. Schuller
Jianhua Tao
VLM
290
31
0
26 Apr 2024
PLLaVA : Parameter-free LLaVA Extension from Images to Videos for Video Dense Captioning
Lin Xu
Yilin Zhao
Daquan Zhou
Zhijie Lin
See Kiong Ng
Jiashi Feng
MLLM
VLM
200
262
0
25 Apr 2024
V2Xum-LLM: Cross-Modal Video Summarization with Temporal Prompt Instruction Tuning
Hang Hua
Yunlong Tang
Chenliang Xu
Jiebo Luo
VGen
277
44
0
18 Apr 2024
From Image to Video, what do we need in multimodal LLMs?
Suyuan Huang
Haoxin Zhang
Yan Gao
Honggu Chen
Yan Gao
Yao Hu
Zhan Qin
VLM
211
11
0
18 Apr 2024
Look, Listen, and Answer: Overcoming Biases for Audio-Visual Question Answering
Jie Ma
Min Hu
Pinghui Wang
Wangchun Sun
Lingyun Song
Hongbin Pei
Jun Liu
Youtian Du
367
15
0
18 Apr 2024
MoReVQA: Exploring Modular Reasoning Models for Video Question Answering
Juhong Min
Shyamal Buch
Arsha Nagrani
Minsu Cho
Cordelia Schmid
LRM
288
58
0
09 Apr 2024
Idea-2-3D: Collaborative LMM Agents Enable 3D Model Generation from Interleaved Multimodal Inputs
Junhao Chen
Xiang Li
Xiaojun Ye
Chao Li
Zhaoxin Fan
Hao Zhao
VGen
3DV
336
6
0
05 Apr 2024
MotionChain: Conversational Motion Controllers via Multimodal Prompts
European Conference on Computer Vision (ECCV), 2024
Biao Jiang
Xin Chen
C. Zhang
Fukun Yin
Zhuoyuan Li
Gang Yu
Jiayuan Fan
VGen
LRM
203
19
0
02 Apr 2024
CausalChaos! Dataset for Comprehensive Causal Action Question Answering Over Longer Causal Chains Grounded in Dynamic Visual Scenes
Paritosh Parmar
Eric Peh
Ruirui Chen
Ting En Lam
Yuhan Chen
Elston Tan
Basura Fernando
CML
250
10
0
01 Apr 2024
Direct Preference Optimization of Video Large Multimodal Models from Language Model Reward
Ruohong Zhang
Liangke Gui
Zhiqing Sun
Yihao Feng
Keyang Xu
...
Di Fu
Chunyuan Li
Alexander G. Hauptmann
Yonatan Bisk
Yiming Yang
MLLM
301
112
0
01 Apr 2024
LITA: Language Instructed Temporal-Localization Assistant
De-An Huang
Shijia Liao
Subhashree Radhakrishnan
Hongxu Yin
Pavlo Molchanov
Zhiding Yu
Jan Kautz
VLM
179
90
0
27 Mar 2024
Contextual AD Narration with Interleaved Multimodal Sequence
Computer Vision and Pattern Recognition (CVPR), 2024
Hanlin Wang
Zhan Tong
Kecheng Zheng
Yujun Shen
Limin Wang
VGen
370
7
0
19 Mar 2024
HawkEye: Training Video-Text LLMs for Grounding Text in Videos
Yueqian Wang
Xiaojun Meng
Jianxin Liang
Yuxuan Wang
Qun Liu
Dongyan Zhao
172
54
0
15 Mar 2024
Knowledge Conflicts for LLMs: A Survey
Conference on Empirical Methods in Natural Language Processing (EMNLP), 2024
Rongwu Xu
Zehan Qi
Zhijiang Guo
Cunxiang Wang
Hongru Wang
Yue Zhang
Wei Xu
586
190
0
13 Mar 2024
Previous
1
2
3
...
11
12
13
14
Next