Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.03143
Cited By
AudioLM: a Language Modeling Approach to Audio Generation
7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"AudioLM: a Language Modeling Approach to Audio Generation"
50 / 427 papers shown
Title
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Zehan Wang
Ke Lei
Chen Zhu
Jiawei Huang
Sashuai Zhou
...
Xize Cheng
Shengpeng Ji
Zhenhui Ye
Tao Jin
Zhou Zhao
29
0
0
15 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yuyao Zhang
Yuan Lu
Yucen He
26
0
0
12 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIP
AI4TS
VLM
36
0
0
12 May 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
48
0
0
08 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Y. Wang
Chaoren Wang
Zehan Li
Zhuo Chen
Zhizheng Wu
135
0
0
07 May 2025
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLM
KELM
VLM
57
0
0
06 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLM
VLM
50
0
0
05 May 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
31
0
0
25 Apr 2025
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
S. Liu
...
Z. Yang
Aoxiong Yin
Ruibin Yuan
Yuhang Zhang
Zaida Zhou
AuLLM
VLM
110
5
0
25 Apr 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
43
0
0
22 Apr 2025
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Yatong Bai
Jonah Casebeer
Somayeh Sojoudi
Nicholas J. Bryan
DiffM
VLM
48
1
0
21 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
69
0
0
17 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
Helen Meng
119
1
0
14 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
30
0
0
12 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
31
0
0
12 Apr 2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead
Ceyuan Yang
Zhijie Lin
Yang Zhao
Shanchuan Lin
...
Zuquan Song
Zhenheng Yang
Jiashi Feng
Jianchao Yang
Lu Jiang
DiffM
90
1
0
11 Apr 2025
Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
54
0
0
11 Apr 2025
AGENT: An Aerial Vehicle Generation and Design Tool Using Large Language Models
Colin Samplawski
Adam Cobb
Susmit Jha
LLMAG
AI4CE
60
0
0
11 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
52
2
0
11 Apr 2025
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
Yilin Wang
Chuan Guo
Yuxuan Mu
Muhammad Gohar Javed
Wei Ji
Juwei Lu
Hai Jiang
Li Cheng
VGen
35
0
0
11 Apr 2025
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Zhen-Hua Ling
36
0
0
09 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
43
0
0
03 Apr 2025
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin
Jeongsoo Choi
Puyuan Peng
Joon Son Chung
Tae-Hyun Oh
David Harwath
VGen
45
1
0
03 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Kaipeng Zhang
MGen
VGen
70
1
0
01 Apr 2025
UniSep: Universal Target Audio Separation with Language Models at Scale
Yishuo Wang
Hangting Chen
Dongchao Yang
Weiqin Li
Dan Luo
Guangzhi Li
Shan Yang
Zhiyong Wu
Helen Meng
Xixin Wu
VLM
49
1
0
31 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
31
0
0
28 Mar 2025
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVM
VGen
74
1
0
27 Mar 2025
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
Tao Feng
Zhiyuan Zhao
Yifan Xie
Yuqi Ye
Xiangyang Luo
Xun Guan
Yongqian Li
57
0
0
21 Mar 2025
QINCODEC: Neural Audio Compression with Implicit Neural Codebooks
Zineb Lahrichi
Gaëtan Hadjeres
Gaël Richard
Geoffroy Peeters
47
0
0
19 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan-Heng Lu
SSL
83
0
0
15 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
52
0
0
13 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
59
0
0
11 Mar 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
77
1
0
08 Mar 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Dingdong Wang
Jin Xu
Ruihang Chu
Zhifang Guo
Xinyu Wang
Jincenzi Wu
Dongchao Yang
Shengpeng Ji
Junyang Lin
AuLLM
83
0
0
04 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
144
0
0
01 Mar 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Xiao-Lei Zhang
Xuelong Li
DiffM
MDE
63
1
0
26 Feb 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLM
ALM
58
0
0
21 Feb 2025
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Borui Liao
Yulong Xu
Jiao Ou
Kaiyuan Yang
Weihua Jian
Pengfei Wan
Di Zhang
AuLLM
62
0
0
20 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
38
3
0
19 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
J. Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
52
0
0
16 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
61
2
0
07 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
171
0
0
05 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
57
1
0
05 Feb 2025
ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling
Yi-Chiao Wu
Dejan Marković
Steven Krenn
I. D. Gebru
Alexander Richard
61
0
0
04 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSL
MQ
82
0
0
04 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
99
2
0
28 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
43
0
0
11 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
41
0
0
10 Jan 2025
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano
Sabit Hassan
P. Sharma
Anthony Sicilia
Katherine Atwell
Diane Litman
Malihe Alikhani
39
0
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
1
2
3
4
5
6
7
8
9
Next