Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2503.20215
Cited By
Qwen2.5-Omni Technical Report
26 March 2025
Jin Xu
Zhifang Guo
Jinzheng He
Hangrui Hu
Ting He
S. Bai
Keqin Chen
Jialin Wang
Yang Fan
K. Dang
Bin Zhang
Xinyu Wang
Yunfei Chu
Junyang Lin
VGen
AuLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Qwen2.5-Omni Technical Report"
12 / 12 papers shown
Title
MMAR: A Challenging Benchmark for Deep Reasoning in Speech, Audio, Music, and Their Mix
Ziyang Ma
Yinghao Ma
Yanqiao Zhu
Chen Yang
Yi-Wen Chao
...
Wei Xue
Emmanouil Benetos
Kai Yu
Eng Siong Chng
Xie Chen
AuLLM
LRM
5
0
0
19 May 2025
Omni-R1: Do You Really Need Audio to Fine-Tune Your Audio LLM?
Andrew Rouditchenko
Saurabhchand Bhati
Edson Araujo
Samuel Thomas
Hilde Kuehne
Rogerio Feris
James R. Glass
AuLLM
VLM
44
0
0
14 May 2025
WavReward: Spoken Dialogue Models With Generalist Reward Evaluators
Shengpeng Ji
Tianle Liang
Yongqian Li
Jialong Zuo
Minghui Fang
...
Xize Cheng
Siqi Zheng
Jin Xu
Junyang Lin
Zhou Zhao
AuLLM
ALM
33
0
0
14 May 2025
Bridging Ears and Eyes: Analyzing Audio and Visual Large Language Models to Humans in Visible Sound Recognition and Reducing Their Sensory Gap via Cross-Modal Distillation
Xilin Jiang
Junkai Wu
Vishal B. Choudhari
N. Mesgarani
VLM
35
0
0
11 May 2025
EchoInk-R1: Exploring Audio-Visual Reasoning in Multimodal LLMs via Reinforcement Learning
Zhenghao Xing
Xiaowei Hu
Chi-Wing Fu
Wei Wang
Jifeng Dai
Pheng-Ann Heng
MLLM
OffRL
VLM
LRM
50
0
0
07 May 2025
VITA-Audio: Fast Interleaved Cross-Modal Token Generation for Efficient Large Speech-Language Model
Zuwei Long
Yunhang Shen
Chaoyou Fu
Heting Gao
Lijiang Li
...
Jinlong Peng
Haoyu Cao
Ke Li
Rongrong Ji
Xing Sun
32
0
0
06 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Cheng Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
1
0
30 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
S. Liu
...
Zheng Yang
Aoxiong Yin
Ruibin Yuan
Yuhang Zhang
Zaida Zhou
AuLLM
VLM
110
5
0
25 Apr 2025
TimeChat-Online: 80% Visual Tokens are Naturally Redundant in Streaming Videos
Linli Yao
Y. Li
Y. X. Wei
Lei Li
Shuhuai Ren
...
Sida Li
Lingpeng Kong
Qi Liu
Yuhang Zhang
Xu Sun
38
1
0
24 Apr 2025
MMInference: Accelerating Pre-filling for Long-Context VLMs via Modality-Aware Permutation Sparse Attention
Yucheng Li
Huiqiang Jiang
Chengruidong Zhang
Qianhui Wu
Xufang Luo
...
Amir H. Abdi
Dongsheng Li
Jianfeng Gao
Yuqing Yang
Lili Qiu
33
1
0
22 Apr 2025
Pre-DPO: Improving Data Utilization in Direct Preference Optimization Using a Guiding Reference Model
Junshu Pan
Wei Shen
Shulin Huang
Qiji Zhou
Yue Zhang
74
0
0
22 Apr 2025
VocalNet: Speech LLM with Multi-Token Prediction for Faster and High-Quality Generation
Yuhao Wang
Heyang Liu
Ziyang Cheng
Ronghua Wu
Qunshan Gu
Yanfeng Wang
Yu Wang
151
0
0
05 Apr 2025
1