ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02858
  4. Cited By
Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video
  Understanding

Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding

5 June 2023
Hang Zhang
Xin Li
Lidong Bing
    MLLM
ArXivPDFHTML

Papers citing "Video-LLaMA: An Instruction-tuned Audio-Visual Language Model for Video Understanding"

50 / 703 papers shown
Title
EAGLE: Egocentric AGgregated Language-video Engine
EAGLE: Egocentric AGgregated Language-video Engine
Jing Bi
Yunlong Tang
Luchuan Song
Ali Vosoughi
Nguyen Nguyen
Chenliang Xu
53
8
0
26 Sep 2024
EAGLE: Towards Efficient Arbitrary Referring Visual Prompts
  Comprehension for Multimodal Large Language Models
EAGLE: Towards Efficient Arbitrary Referring Visual Prompts Comprehension for Multimodal Large Language Models
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Yu-Gang Jiang
33
1
0
25 Sep 2024
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
EventHallusion: Diagnosing Event Hallucinations in Video LLMs
Jiacheng Zhang
Yang Jiao
Shaoxiang Chen
Jingjing Chen
Zhiyu Tan
Hao Li
Jingjing Chen
MLLM
66
18
0
25 Sep 2024
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Multi-Modal Generative AI: Multi-modal LLM, Diffusion and Beyond
Hong Chen
Xin Wang
Yuwei Zhou
Bin Huang
Yipeng Zhang
Wei Feng
Houlun Chen
Zeyang Zhang
Siao Tang
Wenwu Zhu
DiffM
55
7
0
23 Sep 2024
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video
  Understanding
Video-XL: Extra-Long Vision Language Model for Hour-Scale Video Understanding
Yan Shu
Peitian Zhang
Zheng Liu
Minghao Qin
Yueze Wang
Tiejun Huang
Bo Zhao
VLM
54
42
0
22 Sep 2024
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free
  Manner
Interpolating Video-LLMs: Toward Longer-sequence LMMs in a Training-free Manner
Yuzhang Shang
Bingxin Xu
Weitai Kang
Mu Cai
Yuheng Li
Zehao Wen
Zhen Dong
Kurt Keutzer
Yong Jae Lee
Yan Yan
41
7
0
19 Sep 2024
MMSearch: Benchmarking the Potential of Large Models as Multi-modal
  Search Engines
MMSearch: Benchmarking the Potential of Large Models as Multi-modal Search Engines
Dongzhi Jiang
Renrui Zhang
Ziyu Guo
Yanmin Wu
Jiayi Lei
...
Guanglu Song
Peng Gao
Yu Liu
Chunyuan Li
Hongsheng Li
MLLM
45
16
0
19 Sep 2024
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal
  Reasoning with Large Language Models
From Linguistic Giants to Sensory Maestros: A Survey on Cross-Modal Reasoning with Large Language Models
Shengsheng Qian
Zuyi Zhou
Dizhan Xue
Bing Wang
Changsheng Xu
LRM
41
1
0
19 Sep 2024
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Uncertainty-Guided Self-Questioning and Answering for Video-Language Alignment
Jin Chen
Kaijing Ma
Haojian Huang
Jiayu Shen
Han Fang
Xianghao Zang
Chao Ban
79
2
0
17 Sep 2024
Generating Event-oriented Attribution for Movies via Two-Stage
  Prefix-Enhanced Multimodal LLM
Generating Event-oriented Attribution for Movies via Two-Stage Prefix-Enhanced Multimodal LLM
Yuanjie Lyu
Tong Xu
Zihan Niu
Bo Peng
Jing Ke
Enhong Chen
30
0
0
14 Sep 2024
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
PiTe: Pixel-Temporal Alignment for Large Video-Language Model
Yang Liu
Pengxiang Ding
Siteng Huang
Min Zhang
Han Zhao
Donglin Wang
40
7
0
11 Sep 2024
1M-Deepfakes Detection Challenge
1M-Deepfakes Detection Challenge
Zhixi Cai
Abhinav Dhall
Shreya Ghosh
Munawar Hayat
D. Kollias
Kalin Stefanov
Usman Tariq
42
1
0
11 Sep 2024
Question-Answering Dense Video Events
Question-Answering Dense Video Events
Hangyu Qin
Junbin Xiao
Angela Yao
VLM
80
1
0
06 Sep 2024
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with
  Multimodal Large Language Models
StimuVAR: Spatiotemporal Stimuli-aware Video Affective Reasoning with Multimodal Large Language Models
Y. Guo
Faizan Siddiqui
Yang Zhao
Rama Chellappa
Shao-Yuan Lo
LRM
59
2
0
31 Aug 2024
HERMES: temporal-coHERent long-forM understanding with Episodes and
  Semantics
HERMES: temporal-coHERent long-forM understanding with Episodes and Semantics
Gueter Josmy Faure
Jia-Fong Yeh
Min-Hung Chen
Hung-Ting Su
Winston H. Hsu
Shang-Hong Lai
31
3
0
30 Aug 2024
A longitudinal sentiment analysis of Sinophobia during COVID-19 using
  large language models
A longitudinal sentiment analysis of Sinophobia during COVID-19 using large language models
Chen Wang
Rohitash Chandra
28
0
0
29 Aug 2024
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths
  Vision Computation
VideoLLM-MoD: Efficient Video-Language Streaming with Mixture-of-Depths Vision Computation
Shiwei Wu
Joya Chen
Kevin Qinghong Lin
Qimeng Wang
Yan Gao
Qianli Xu
Tong Xu
Yao Hu
Enhong Chen
Mike Zheng Shou
VLM
59
12
0
29 Aug 2024
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Mini-Omni: Language Models Can Hear, Talk While Thinking in Streaming
Zhifei Xie
Changqiao Wu
AuLLM
VGen
VLM
SyDa
LRM
37
60
0
29 Aug 2024
CogVLM2: Visual Language Models for Image and Video Understanding
CogVLM2: Visual Language Models for Image and Video Understanding
Wenyi Hong
Weihan Wang
Ming Ding
Wenmeng Yu
Qingsong Lv
...
Debing Liu
Bin Xu
Juanzi Li
Yuxiao Dong
Jie Tang
VLM
MLLM
50
89
0
29 Aug 2024
Training-free Video Temporal Grounding using Large-scale Pre-trained
  Models
Training-free Video Temporal Grounding using Large-scale Pre-trained Models
Minghang Zheng
Xinhao Cai
Qingchao Chen
Yuxin Peng
Yang Liu
45
4
0
29 Aug 2024
Video-CCAM: Enhancing Video-Language Understanding with Causal
  Cross-Attention Masks for Short and Long Videos
Video-CCAM: Enhancing Video-Language Understanding with Causal Cross-Attention Masks for Short and Long Videos
Jiajun Fei
Dian Li
Zhidong Deng
Zekun Wang
Gang Liu
Hui Wang
VLM
43
35
0
26 Aug 2024
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
LMM-VQA: Advancing Video Quality Assessment with Large Multimodal Models
Qihang Ge
Wei Sun
Yu Zhang
Yunhao Li
Zhongpeng Ji
Fengyu Sun
Shangling Jui
Xiongkuo Min
Guangtao Zhai
56
4
0
26 Aug 2024
T3M: Text Guided 3D Human Motion Synthesis from Speech
T3M: Text Guided 3D Human Motion Synthesis from Speech
Wenshuo Peng
Kaipeng Zhang
Sai Qian Zhang
38
0
0
23 Aug 2024
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
MuMA-ToM: Multi-modal Multi-Agent Theory of Mind
Haojun Shi
Suyu Ye
Xinyu Fang
Chuanyang Jin
Leyla Isik
Yen-Ling Kuo
Tianmin Shu
LLMAG
78
8
0
22 Aug 2024
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for
  Saliency Prediction with Diffusion
CaRDiff: Video Salient Object Ranking Chain of Thought Reasoning for Saliency Prediction with Diffusion
Yunlong Tang
Gen Zhan
Li Yang
Yiting Liao
Chenliang Xu
VGen
DiffM
LRM
53
8
0
21 Aug 2024
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction
  Tuning
EMO-LLaMA: Enhancing Facial Emotion Understanding with Instruction Tuning
Bohao Xing
Zitong Yu
Xin Liu
Kaishen Yuan
Qilang Ye
Weicheng Xie
Huanjing Yue
Jingyu Yang
Heikki Kälviäinen
58
10
0
21 Aug 2024
Video Emotion Open-vocabulary Recognition Based on Multimodal Large
  Language Model
Video Emotion Open-vocabulary Recognition Based on Multimodal Large Language Model
Mengying Ge
Dongkai Tang
Mingyang Li
VLM
22
1
0
21 Aug 2024
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for
  Multimodal Emotion Recognition
SZTU-CMU at MER2024: Improving Emotion-LLaMA with Conv-Attention for Multimodal Emotion Recognition
Zebang Cheng
Shuyuan Tu
Dawei Huang
Minghan Li
Xiaojiang Peng
Zhi-Qi Cheng
Alexander G. Hauptmann
55
2
0
20 Aug 2024
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
CogVideoX: Text-to-Video Diffusion Models with An Expert Transformer
Zhuoyi Yang
Jiayan Teng
Wendi Zheng
Ming Ding
Shiyu Huang
...
Weihan Wang
Yean Cheng
Xiaotao Gu
Yuxiao Dong
Jie Tang
DiffM
VGen
104
408
0
12 Aug 2024
Egocentric Vision Language Planning
Egocentric Vision Language Planning
Zhirui Fang
Ming Yang
Weishuai Zeng
Boyu Li
Junpeng Yue
Ziluo Ding
Xiu Li
Zongqing Lu
LM&Ro
42
1
0
11 Aug 2024
VideoQA in the Era of LLMs: An Empirical Study
VideoQA in the Era of LLMs: An Empirical Study
Junbin Xiao
Nanxin Huang
Hangyu Qin
Dongyang Li
Yicong Li
...
Zhulin Tao
Jianxing Yu
Liang Lin
Tat-Seng Chua
Angela Yao
38
10
0
08 Aug 2024
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready
  Multimodal Conversational Agents
OpenOmni: A Collaborative Open Source Tool for Building Future-Ready Multimodal Conversational Agents
Qiang Sun
Yuanyi Luo
Sirui Li
Wenxiao Zhang
Wei Liu
AuLLM
LLMAG
VLM
33
2
0
06 Aug 2024
Latent-INR: A Flexible Framework for Implicit Representations of Videos
  with Discriminative Semantics
Latent-INR: A Flexible Framework for Implicit Representations of Videos with Discriminative Semantics
Shishira R. Maiya
Anubhav Gupta
M. Gwilliam
Max Ehrlich
Abhinav Shrivastava
42
3
1
05 Aug 2024
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks
  With Large Language Model
UnifiedMLLM: Enabling Unified Representation for Multi-modal Multi-tasks With Large Language Model
Zhaowei Li
Wei Wang
Yiqing Cai
Xu Qi
Pengyu Wang
Dong Zhang
Hang Song
Botian Jiang
Zhida Huang
Tao Wang
AIFin
LRM
42
3
0
05 Aug 2024
Infusing Environmental Captions for Long-Form Video Language Grounding
Infusing Environmental Captions for Long-Form Video Language Grounding
Hyogun Lee
Soyeon Hong
Mujeen Sung
Jinwoo Choi
52
0
0
05 Aug 2024
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
User-in-the-loop Evaluation of Multimodal LLMs for Activity Assistance
Mrinal Verghese
Brian Chen
H. Eghbalzadeh
Tushar Nagarajan
Ruta Desai
LRM
47
1
0
04 Aug 2024
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver
  Behavior Analysis
Multi-Frame Vision-Language Model for Long-form Reasoning in Driver Behavior Analysis
Hiroshi Takato
Hiroshi Tsutsui
Komei Soda
Hidetaka Kamigaito
VLM
40
0
0
03 Aug 2024
A Comprehensive Review of Multimodal Large Language Models: Performance
  and Challenges Across Different Tasks
A Comprehensive Review of Multimodal Large Language Models: Performance and Challenges Across Different Tasks
Jiaqi Wang
Hanqi Jiang
Yi-Hsueh Liu
Chong Ma
Xu-Yao Zhang
...
Xin Zhang
Wei Zhang
Dinggang Shen
Tianming Liu
Shu Zhang
VLM
AI4TS
56
32
0
02 Aug 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
48
2
0
01 Aug 2024
MMTrail: A Multimodal Trailer Video Dataset with Language and Music
  Descriptions
MMTrail: A Multimodal Trailer Video Dataset with Language and Music Descriptions
Xiaowei Chi
Yatian Wang
Aosong Cheng
Pengjun Fang
Zeyue Tian
...
Wenhan Luo
Qifeng Chen
Shanghang Zhang
Qi-fei Liu
Yi-Ting Guo
75
7
0
30 Jul 2024
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language
  Models
CoMMIT: Coordinated Instruction Tuning for Multimodal Large Language Models
Junda Wu
Xintong Li
Tong Yu
Yu Wang
Xiang Chen
Jiuxiang Gu
Lina Yao
Jingbo Shang
Julian McAuley
52
0
0
29 Jul 2024
EPD: Long-term Memory Extraction, Context-awared Planning and
  Multi-iteration Decision @ EgoPlan Challenge ICML 2024
EPD: Long-term Memory Extraction, Context-awared Planning and Multi-iteration Decision @ EgoPlan Challenge ICML 2024
Letian Shi
Qi Lv
Xiang Deng
Liqiang Nie
40
1
0
28 Jul 2024
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
LLaVA-Read: Enhancing Reading Ability of Multimodal Language Models
Ruiyi Zhang
Yufan Zhou
Jian Chen
Jiuxiang Gu
Changyou Chen
Tongfei Sun
VLM
41
6
0
27 Jul 2024
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image
  Relational Association Capabilities in Large Visual Language Models
MMRA: A Benchmark for Evaluating Multi-Granularity and Multi-Image Relational Association Capabilities in Large Visual Language Models
Siwei Wu
Kang Zhu
Yu Bai
Yiming Liang
Yizhi Li
...
Xingwei Qu
Xuxin Cheng
Ge Zhang
Wenhao Huang
Chenghua Lin
VLM
39
2
0
24 Jul 2024
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with
  Micro-Expression Dynamics in Video Dialogues
MicroEmo: Time-Sensitive Multimodal Emotion Recognition with Micro-Expression Dynamics in Video Dialogues
Liyun Zhang
38
1
0
23 Jul 2024
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language
  Models
SlowFast-LLaVA: A Strong Training-Free Baseline for Video Large Language Models
Mingze Xu
Mingfei Gao
Zhe Gan
Hong-You Chen
Zhengfeng Lai
Haiming Gang
Kai Kang
Afshin Dehghan
69
49
0
22 Jul 2024
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language
  Understanding
LongVideoBench: A Benchmark for Long-context Interleaved Video-Language Understanding
Haoning Wu
Dongxu Li
Bei Chen
Junnan Li
40
114
0
22 Jul 2024
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained
  Spatial-Temporal Understanding
WTS: A Pedestrian-Centric Traffic Video Dataset for Fine-grained Spatial-Temporal Understanding
Quan Kong
Yuki Kawana
Rajat Saini
Ashutosh Kumar
Jingjing Pan
...
Yohei Ozao
Balázs Opra
D. Anastasiu
Yoichi Sato
Norimasa Kobori
VGen
44
8
0
22 Jul 2024
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in
  Specific Weighting Layer
DOPRA: Decoding Over-accumulation Penalization and Re-allocation in Specific Weighting Layer
Jinfeng Wei
Xiaofeng Zhang
28
13
0
21 Jul 2024
Navigation Instruction Generation with BEV Perception and Large Language
  Models
Navigation Instruction Generation with BEV Perception and Large Language Models
Sheng Fan
Rui Liu
Wenguan Wang
Yi Yang
50
5
0
21 Jul 2024
Previous
123...678...131415
Next