Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.15111
Cited By
HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding
28 January 2025
Jiaxing Zhao
Q. Yang
Yixing Peng
Detao Bai
Shimin Yao
Boyuan Sun
Xiang Chen
Shenghao Fu
Weixuan chen
Xihan Wei
Liefeng Bo
VGen
AuLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"HumanOmni: A Large Vision-Speech Language Model for Human-Centric Video Understanding"
3 / 3 papers shown
Title
ActionArt: Advancing Multimodal Large Models for Fine-Grained Human-Centric Video Understanding
Yi-Xing Peng
Q. Yang
Yu-Ming Tang
Shenghao Fu
Kun-Yu Lin
Xihan Wei
Wei-Shi Zheng
45
0
0
25 Apr 2025
ViSpeak: Visual Instruction Feedback in Streaming Videos
Shenghao Fu
Q. Yang
Yuan-Ming Li
Yi-Xing Peng
Kun-Yu Lin
Xihan Wei
Jian-Fang Hu
Xiaohua Xie
Wei-Shi Zheng
VLM
67
1
0
17 Mar 2025
R1-Omni: Explainable Omni-Multimodal Emotion Recognition with Reinforcement Learning
Jiaxing Zhao
Xihan Wei
Liefeng Bo
OffRL
42
15
0
07 Mar 2025
1