Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.07919
Cited By
Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models
14 November 2023
Yunfei Chu
Jin Xu
Xiaohuan Zhou
Qian Yang
Shiliang Zhang
Zhijie Yan
Chang Zhou
Jingren Zhou
AuLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Qwen-Audio: Advancing Universal Audio Understanding via Unified Large-Scale Audio-Language Models"
50 / 213 papers shown
Title
Seal: Advancing Speech Language Models to be Few-Shot Learners
Shuyu Lei
Lingen Liu
Jiaolong Yang
Yasen Jiao
Yuxiang Yang
Yushu Yang
Xiang Guo
VLM
30
0
0
20 Jul 2024
VLMEvalKit: An Open-Source Toolkit for Evaluating Large Multi-Modality Models
Haodong Duan
Junming Yang
Junming Yang
Xinyu Fang
Lin Chen
...
Yuhang Zang
Pan Zhang
Jiaqi Wang
Dahua Lin
Kai Chen
LM&MA
VLM
36
115
0
16 Jul 2024
Qwen2-Audio Technical Report
Yunfei Chu
Jin Xu
Qian Yang
Haojie Wei
Xipin Wei
...
Yuanjun Lv
Jinzheng He
Junyang Lin
Chang Zhou
Jingren Zhou
AuLLM
VLM
37
105
0
15 Jul 2024
Qwen2 Technical Report
An Yang
Baosong Yang
Binyuan Hui
Jian Xu
Bowen Yu
...
Yuqiong Liu
Zeyu Cui
Zhenru Zhang
Zhifang Guo
Zhi-Wei Fan
OSLM
VLM
MU
60
786
0
15 Jul 2024
Speech-Copilot: Leveraging Large Language Models for Speech Processing via Task Decomposition, Modularization, and Program Generation
Chun-Yi Kuan
Chih-Kai Yang
Wei-Ping Huang
Ke-Han Lu
Hung-yi Lee
46
5
0
13 Jul 2024
Pronunciation Assessment with Multi-modal Large Language Models
Kaiqi Fu
Linkai Peng
Nan Yang
Shuran Zhou
29
2
0
12 Jul 2024
AffectGPT: Dataset and Framework for Explainable Multimodal Emotion Recognition
Zheng Lian
Haiyang Sun
Guoying Zhao
Jiangyan Yi
Bin Liu
Jianhua Tao
47
2
0
10 Jul 2024
Listen and Speak Fairly: A Study on Semantic Gender Bias in Speech Integrated Large Language Models
Yi-Cheng Lin
T. Lin
Chih-Kai Yang
Ke-Han Lu
Wei-Chih Chen
Chun-Yi Kuan
Hung-yi Lee
34
1
0
09 Jul 2024
Seed-ASR: Understanding Diverse Speech and Contexts with LLM-based Speech Recognition
Ye Bai
Jingping Chen
Jitong Chen
Wei Chen
Zhuo Chen
...
Wanyi Zhang
Yang Zhang
Yawei Zhang
Yijie Zheng
Ming Zou
AuLLM
44
19
0
05 Jul 2024
FunAudioLLM: Voice Understanding and Generation Foundation Models for Natural Interaction Between Humans and LLMs
Keyu An
Qian Chen
Chong Deng
Zhihao Du
Changfeng Gao
...
Bin Zhang
Qinglin Zhang
Shiliang Zhang
Nan Zhao
Siqi Zheng
AuLLM
32
44
0
04 Jul 2024
Investigating Decoder-only Large Language Models for Speech-to-text Translation
Chao-Wei Huang
Hui Lu
Hongyu Gong
H. Inaguma
Ilia Kulikov
Ruslan Mavlyutov
Sravya Popuri
AuLLM
LRM
55
6
0
03 Jul 2024
Factor-Conditioned Speaking-Style Captioning
Atsushi Ando
Takafumi Moriya
Shota Horiguchi
Ryo Masumura
35
0
0
27 Jun 2024
DeSTA: Enhancing Speech Language Models through Descriptive Speech-Text Alignment
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
He Huang
Boris Ginsburg
Yu-Chiang Frank Wang
Hung-yi Lee
VLM
AuLLM
35
9
0
27 Jun 2024
Streaming Decoder-Only Automatic Speech Recognition with Discrete Speech Units: A Pilot Study
Peikun Chen
Sining Sun
Changhao Shan
Qing Yang
Lei Xie
40
2
0
27 Jun 2024
Towards Probing Speech-Specific Risks in Large Multimodal Models: A Taxonomy, Benchmark, and Insights
Hao Yang
Lizhen Qu
Ehsan Shareghi
Gholamreza Haffari
28
0
0
25 Jun 2024
AudioBench: A Universal Benchmark for Audio Large Language Models
Bin Wang
Xunlong Zou
Geyu Lin
Shri Kiran Srinivasan
Zhuohan Liu
Wenyu Zhang
Zhengyuan Liu
AiTi Aw
Nancy F. Chen
AuLLM
ELM
LM&MA
92
20
0
23 Jun 2024
Transferable speech-to-text large language model alignment module
Boyong Wu
Chao Yan
Haoran Pu
35
0
0
19 Jun 2024
Towards Audio Codec-based Speech Separation
J. Yip
Shengkui Zhao
Dianwen Ng
Eng Siong Chng
Bin Ma
30
6
0
18 Jun 2024
Improving Text-To-Audio Models with Synthetic Captions
Zhifeng Kong
Sang-gil Lee
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Rafael Valle
Soujanya Poria
Bryan Catanzaro
45
11
0
18 Jun 2024
GAMA: A Large Audio-Language Model with Advanced Audio Understanding and Complex Reasoning Abilities
Sreyan Ghosh
Sonal Kumar
Ashish Seth
Chandra Kiran Reddy Evuru
Utkarsh Tyagi
S. Sakshi
Oriol Nieto
R. Duraiswami
Dinesh Manocha
AuLLM
LRM
46
36
0
17 Jun 2024
Emotion-LLaMA: Multimodal Emotion Recognition and Reasoning with Instruction Tuning
Zebang Cheng
Zhi-Qi Cheng
Jun-Yan He
Jingdong Sun
Kai Wang
Yuxiang Lin
Zheng Lian
Xiaojiang Peng
Alexander G. Hauptmann
MLLM
31
29
0
17 Jun 2024
UniAudio 1.5: Large Language Model-driven Audio Codec is A Few-shot Audio Task Learner
Dongchao Yang
Haohan Guo
Yuanyuan Wang
Rongjie Huang
Xiang Li
Xu Tan
Xixin Wu
Helen Meng
AuLLM
41
15
0
14 Jun 2024
DiscreteSLU: A Large Language Model with Self-Supervised Discrete Speech Units for Spoken Language Understanding
Suwon Shon
Kwangyoun Kim
Yi-Te Hsu
Prashant Sridhar
Shinji Watanabe
Karen Livescu
AuLLM
46
2
0
13 Jun 2024
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan
Wei-Ping Huang
Hung-yi Lee
AuLLM
31
5
0
12 Jun 2024
ParaCLAP -- Towards a general language-audio model for computational paralinguistic tasks
Xin Jing
Andreas Triantafyllopoulos
Björn Schuller
29
2
0
11 Jun 2024
MaLa-ASR: Multimedia-Assisted LLM-Based ASR
Guanrou Yang
Ziyang Ma
Fan Yu
Zhifu Gao
Shiliang Zhang
Xie Chen
AuLLM
38
2
0
09 Jun 2024
MindSemantix: Deciphering Brain Visual Experiences with a Brain-Language Model
Ziqi Ren
Jie Li
Xuetong Xue
Xin Li
Fan Yang
Zhicheng Jiao
Xinbo Gao
38
3
0
29 May 2024
CrossCheckGPT: Universal Hallucination Ranking for Multimodal Foundation Models
Guangzhi Sun
Potsawee Manakul
Adian Liusie
Kunat Pipatanakul
Chao Zhang
P. Woodland
Mark J. F. Gales
HILM
MLLM
22
7
0
22 May 2024
SpeechVerse: A Large-scale Generalizable Audio Language Model
Nilaksh Das
Saket Dingliwal
S. Ronanki
Rohit Paturi
David Huang
...
Monica Sunkara
S. Srinivasan
Kyu J. Han
Katrin Kirchhoff
Katrin Kirchhoff
41
37
0
14 May 2024
MMGER: Multi-modal and Multi-granularity Generative Error Correction with LLM for Joint Accent and Speech Recognition
Bingshen Mu
Yangze Li
Qijie Shao
Kun Wei
Xucheng Wan
Naijun Zheng
Huan Zhou
Lei Xie
40
5
0
06 May 2024
Unveiling the Potential of LLM-Based ASR on Chinese Open-Source Datasets
Xuelong Geng
Tianyi Xu
Kun Wei
Bingshen Mu
Hongfei Xue
...
Pengcheng Guo
Yuhang Dai
Longhao Li
Mingchen Shao
Lei Xie
36
9
0
03 May 2024
How to Parameterize Asymmetric Quantization Ranges for Quantization-Aware Training
Jaeseong You
Minseop Park
Kyunggeun Lee
Seokjun An
Chirag I. Patel
Markus Nagel
MQ
41
1
0
25 Apr 2024
AccidentBlip: Agent of Accident Warning based on MA-former
Yihua Shao
Hongyi Cai
Xinwei Long
Weiyi Lang
Ziyang Yan
Haoran Wu
Yan Wang
Jiayi Yin
Yang Yang
Yisheng Lv
37
2
0
18 Apr 2024
Deep Learning and LLM-based Methods Applied to Stellar Lightcurve Classification
Yu-Yang Li
Yu Bai
Cunshi Wang
Mengwei Qu
Ziteng Lu
Roberto Soria
Jifeng Liu
25
3
0
16 Apr 2024
Resilience of Large Language Models for Noisy Instructions
Bin Wang
Chengwei Wei
Zhengyuan Liu
Geyu Lin
Nancy F. Chen
47
11
0
15 Apr 2024
On Speculative Decoding for Multimodal Large Language Models
Mukul Gagrani
Raghavv Goel
Wonseok Jeon
Junyoung Park
Mingu Lee
Christopher Lott
LRM
32
8
0
13 Apr 2024
Audio Dialogues: Dialogues dataset for audio and music understanding
Arushi Goel
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
AuLLM
29
4
0
11 Apr 2024
WavLLM: Towards Robust and Adaptive Speech Large Language Model
Shujie Hu
Long Zhou
Shujie Liu
Sanyuan Chen
Hongkun Hao
...
Xunying Liu
Jinyu Li
S. Sivasankaran
Linquan Liu
Furu Wei
AuLLM
21
43
0
31 Mar 2024
An Image Grid Can Be Worth a Video: Zero-shot Video Question Answering Using a VLM
Wonkyun Kim
Changin Choi
Wonseok Lee
Wonjong Rhee
VLM
47
51
0
27 Mar 2024
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
43
6
0
21 Mar 2024
Twists, Humps, and Pebbles: Multilingual Speech Recognition Models Exhibit Gender Performance Gaps
Giuseppe Attanasio
Beatrice Savoldi
Dennis Fucci
Dirk Hovy
31
4
0
28 Feb 2024
Uncertainty-Aware Evaluation for Vision-Language Models
Vasily Kostumov
Bulat Nutfullin
Oleg Pilipenko
Eugene Ilyushin
ELM
50
8
0
22 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
34
22
0
20 Feb 2024
Speech Translation with Speech Foundation Models and Large Language Models: What is There and What is Missing?
Marco Gaido
Sara Papi
Matteo Negri
L. Bentivogli
41
13
0
19 Feb 2024
Domain Adaptation for Contrastive Audio-Language Models
Soham Deshmukh
Rita Singh
Bhiksha Raj
VLM
32
7
0
14 Feb 2024
An Embarrassingly Simple Approach for LLM with Strong ASR Capacity
Ziyang Ma
Guanrou Yang
Yifan Yang
Zhifu Gao
Jiaming Wang
...
Fan Yu
Qian Chen
Siqi Zheng
Shiliang Zhang
Xie Chen
AuLLM
47
38
0
13 Feb 2024
AIR-Bench: Benchmarking Large Audio-Language Models via Generative Comprehension
Qian Yang
Jin Xu
Wenrui Liu
Yunfei Chu
Ziyue Jiang
...
Yichong Leng
Yuanjun Lv
Zhou Zhao
Chang Zhou
Jingren Zhou
LM&MA
AuLLM
ALM
49
58
0
12 Feb 2024
Cacophony: An Improved Contrastive Audio-Text Model
Ge Zhu
Jordan Darefsky
Zhiyao Duan
AuLLM
46
11
0
10 Feb 2024
Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities
Zhifeng Kong
Arushi Goel
Rohan Badlani
Ming-Yu Liu
Rafael Valle
Bryan Catanzaro
AuLLM
LM&MA
MLLM
74
73
0
02 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
35
14
0
02 Feb 2024
Previous
1
2
3
4
5
Next