ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2310.13289
  4. Cited By
SALMONN: Towards Generic Hearing Abilities for Large Language Models

SALMONN: Towards Generic Hearing Abilities for Large Language Models

20 October 2023
Changli Tang
Wenyi Yu
Guangzhi Sun
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
    LM&MA
    AuLLM
ArXivPDFHTML

Papers citing "SALMONN: Towards Generic Hearing Abilities for Large Language Models"

50 / 50 papers shown
Title
Speechless: Speech Instruction Training Without Speech for Low Resource Languages
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
Tuan Le Duc Anh
Shreyas Gopal
Yue Heng Yeo
Warren Keng Hoong Low
Eng Siong Chng
J. Yip
SyDa
59
1
0
23 May 2025
JALMBench: Benchmarking Jailbreak Vulnerabilities in Audio Language Models
Zifan Peng
Yule Liu
Zhen Sun
Mingchen Li
Zeren Luo
...
Xinlei He
Xuechao Wang
Yingjie Xue
Shengmin Xu
Xinyi Huang
AuLLM
AAML
75
1
0
23 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLL
MoMe
161
0
0
23 May 2025
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
SARI: Structured Audio Reasoning via Curriculum-Guided Reinforcement Learning
Cheng Wen
Tingwei Guo
Shuaijiang Zhao
Wei Zou
Xiangang Li
OffRL
AuLLM
LRM
83
6
0
22 Apr 2025
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
SIFT-50M: A Large-Scale Multilingual Dataset for Speech Instruction Fine-Tuning
Prabhat Pandey
Rupak Vignesh Swaminathan
K V Vijay Girish
Arunasish Sen
Jian Xie
Grant P. Strimel
Andreas Schwarz
357
2
0
12 Apr 2025
Spatial Audio Processing with Large Language Model on Wearable Devices
Spatial Audio Processing with Large Language Model on Wearable Devices
Ayushi Mishra
Yang Bai
Priyadarshan Narayanasamy
Nakul Garg
Nirupam Roy
65
0
0
11 Apr 2025
Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection
Exploring In-Context Learning Capabilities of ChatGPT for Pathological Speech Detection
Mahdi Amiri
Hatef Otroshi Shahreza
Ina Kodrasi
87
0
0
31 Mar 2025
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
DeSTA2: Developing Instruction-Following Speech Language Model Without Speech Instruction-Tuning Data
Ke-Han Lu
Zhehuai Chen
Szu-Wei Fu
Chao-Han Huck Yang
Jagadeesh Balam
Boris Ginsburg
Yu-Te Wang
Hung-yi Lee
AuLLM
SyDa
128
16
0
28 Jan 2025
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
OSUM: Advancing Open Speech Understanding Models with Limited Resources in Academia
Xuelong Geng
Kun Wei
Qijie Shao
Shuiyun Liu
Zhennan Lin
...
Yuhang Dai
Xinfa Zhu
Yue Li
Li Zhang
Lei Xie
107
5
0
23 Jan 2025
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
LLM supervised Pre-training for Multimodal Emotion Recognition in Conversations
Soumya Dutta
Sriram Ganapathy
86
5
0
20 Jan 2025
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
SD-Eval: A Benchmark Dataset for Spoken Dialogue Understanding Beyond Words
Junyi Ao
Yuancheng Wang
Xiaohai Tian
Dekun Chen
Jing Zhang
Lu Lu
Yansen Wang
Haizhou Li
Zhikai Wu
AuLLM
123
23
0
17 Jan 2025
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Editing Music with Melody and Text: Using ControlNet for Diffusion Transformer
Siyuan Hou
Shansong Liu
Ruibin Yuan
Wei Xue
Ying Shan
Mangsuo Zhao
Chao Zhang
114
4
0
17 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
114
2
0
10 Jan 2025
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Listening and Seeing Again: Generative Error Correction for Audio-Visual Speech Recognition
Rui Liu
Hongyu Yuan
Hong Li
80
0
0
03 Jan 2025
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Can Large Audio-Language Models Truly Hear? Tackling Hallucinations with Multi-Task Assessment and Stepwise Audio Reasoning
Chun-Yi Kuan
Hung-yi Lee
AuLLM
LRM
109
7
0
03 Jan 2025
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
TangoFlux: Super Fast and Faithful Text to Audio Generation with Flow Matching and Clap-Ranked Preference Optimization
Chia-Yu Hung
Navonil Majumder
Zhifeng Kong
Ambuj Mehrish
Rafael Valle
Bryan Catanzaro
Soujanya Poria
Bryan Catanzaro
Soujanya Poria
83
8
0
30 Dec 2024
Addressing Hallucinations in Language Models with Knowledge Graph Embeddings as an Additional Modality
Viktoriia Chekalina
Anton Razzigaev
Elizaveta Goncharova
Andrey Kuznetsov
KELM
110
0
0
18 Nov 2024
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
The Semantic Hub Hypothesis: Language Models Share Semantic Representations Across Languages and Modalities
Zhaofeng Wu
Xinyan Velocity Yu
Dani Yogatama
Jiasen Lu
Yoon Kim
AIFin
77
19
0
07 Nov 2024
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Performance evaluation of SLAM-ASR: The Good, the Bad, the Ugly, and the Way Forward
Shashi Kumar
Iuliia Thorbecke
Sergio Burdisso
Esaú Villatoro-Tello
Marcelo Errecalde
Kadri Hacioğlu
Pradeep Rangappa
P. Motlícek
A. Ganapathiraju
Andreas Stolcke
112
2
0
06 Nov 2024
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Ichigo: Mixed-Modal Early-Fusion Realtime Voice Assistant
Alan Dao
Dinh Bach Vu
Huy Hoang Ha
AuLLM
VLM
103
5
0
20 Oct 2024
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Frozen Large Language Models Can Perceive Paralinguistic Aspects of Speech
Wonjune Kang
Junteng Jia
Chunyang Wu
Wei Zhou
Egor Lakomkin
...
Leda Sari
Suyoun Kim
Ke Li
Jay Mahadeokar
Ozlem Kalinli
AuLLM
67
6
0
02 Oct 2024
Recent Advances in Speech Language Models: A Survey
Recent Advances in Speech Language Models: A Survey
Wenqian Cui
Dianzhi Yu
Xiaoqi Jiao
Ziqiao Meng
Guangyan Zhang
Qichao Wang
Yiwen Guo
Irwin King
AuLLM
119
23
0
01 Oct 2024
MIO: A Foundation Model on Multimodal Tokens
MIO: A Foundation Model on Multimodal Tokens
Zekun Wang
King Zhu
Chunpu Xu
Wangchunshu Zhou
Jiaheng Liu
...
Yuanxing Zhang
Ge Zhang
Ke Xu
Jie Fu
Wenhao Huang
MLLM
AuLLM
92
11
0
26 Sep 2024
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Enabling Auditory Large Language Models for Automatic Speech Quality Evaluation
Siyin Wang
Wenyi Yu
Yudong Yang
Changli Tang
Yixuan Li
...
Jun Zhang
Guangzhi Sun
Lu Lu
Yuxuan Wang
Chao Zhang
AuLLM
LM&MA
88
7
0
25 Sep 2024
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
MT2KD: Towards A General-Purpose Encoder for Speech, Speaker, and Audio Events
Xiaoyu Yang
Qiujia Li
Chao Zhang
P. Woodland
58
1
0
25 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
92
17
0
23 Sep 2024
What Are They Doing? Joint Audio-Speech Co-Reasoning
What Are They Doing? Joint Audio-Speech Co-Reasoning
Yingzhi Wang
Pooneh Mousavi
Artem Ploujnikov
Mirco Ravanelli
AuLLM
64
2
0
22 Sep 2024
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Enhancing Low-Resource Language and Instruction Following Capabilities of Audio Language Models
Potsawee Manakul
Guangzhi Sun
Warit Sirichotedumrong
Kasima Tharnpipitchai
Kunat Pipatanakul
AuLLM
67
7
0
17 Sep 2024
Chain-of-Thought Prompting for Speech Translation
Chain-of-Thought Prompting for Speech Translation
Ke Hu
Zhehuai Chen
Chao-Han Huck Yang
Piotr Żelasko
Oleksii Hrinchuk
Vitaly Lavrukhin
Jagadeesh Balam
Boris Ginsburg
LRM
120
8
0
17 Sep 2024
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Large Language Model Can Transcribe Speech in Multi-Talker Scenarios with Versatile Instructions
Lingwei Meng
Shujie Hu
Jiawen Kang
Zhaoqing Li
Yuejiao Wang
Wenxuan Wu
Xixin Wu
Xunying Liu
Helen Meng
AuLLM
119
3
0
13 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
88
6
0
11 Sep 2024
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
MoWE-Audio: Multitask AudioLLMs with Mixture of Weak Encoders
Wentao Zhang
Shuo Sun
Bin Wang
Xunlong Zou
Zhuohan Liu
Yingxu He
Geyu Lin
Nancy F. Chen
Ai Ti Aw
AuLLM
82
1
0
10 Sep 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
87
42
0
29 Aug 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
74
16
0
02 Feb 2024
Fine-grained Audio-Visual Joint Representations for Multimodal Large
  Language Models
Fine-grained Audio-Visual Joint Representations for Multimodal Large Language Models
Guangzhi Sun
Wenyi Yu
Changli Tang
Xianzhao Chen
Tian Tan
Wei Li
Lu Lu
Zejun Ma
Chao Zhang
64
12
0
09 Oct 2023
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong
  General Audio Event Taggers
Whisper-AT: Noise-Robust Automatic Speech Recognizers are Also Strong General Audio Event Taggers
Yuan Gong
Sameer Khurana
Leonid Karlinsky
James R. Glass
46
70
0
06 Jul 2023
VideoLLM: Modeling Video Sequence with Large Language Models
VideoLLM: Modeling Video Sequence with Large Language Models
Guo Chen
Yin-Dong Zheng
Jiahao Wang
Jilan Xu
Yifei Huang
...
Yi Wang
Yali Wang
Yu Qiao
Tong Lu
Limin Wang
MLLM
118
83
0
22 May 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
87
209
0
30 Mar 2023
Adapting self-supervised models to multi-talker speech recognition using
  speaker embeddings
Adapting self-supervised models to multi-talker speech recognition using speaker embeddings
Zili Huang
Desh Raj
Leibny Paola García-Perera
Sanjeev Khudanpur
125
26
0
01 Nov 2022
Scaling Instruction-Finetuned Language Models
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
167
3,110
0
20 Oct 2022
Emergent Abilities of Large Language Models
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
263
2,462
0
15 Jun 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
344
3,515
0
29 Apr 2022
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
206
1,846
0
26 Oct 2021
Finetuned Language Models Are Zero-Shot Learners
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
118
3,723
0
03 Sep 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of
  Transcribed Audio
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
100
372
0
13 Jun 2021
GLM: General Language Model Pretraining with Autoregressive Blank
  Infilling
GLM: General Language Model Pretraining with Autoregressive Blank Infilling
Zhengxiao Du
Yujie Qian
Xiao Liu
Ming Ding
J. Qiu
Zhilin Yang
Jie Tang
BDL
AI4CE
109
1,535
0
18 Mar 2021
SLURP: A Spoken Language Understanding Resource Package
SLURP: A Spoken Language Understanding Resource Package
E. Bastianelli
Andrea Vanzo
P. Swietojanski
Verena Rieser
VLM
85
228
0
26 Nov 2020
Emotion recognition by fusing time synchronous and time asynchronous
  representations
Emotion recognition by fusing time synchronous and time asynchronous representations
Wen Wu
Chao Zhang
P. Woodland
46
67
0
27 Oct 2020
Clotho: An Audio Captioning Dataset
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
87
388
0
21 Oct 2019
Learning Features of Music from Scratch
Learning Features of Music from Scratch
John Thickstun
Zaïd Harchaoui
Sham Kakade
141
202
0
29 Nov 2016
1