ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation
v1v2 (latest)

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 452 papers shown
Title
Watermarking Autoregressive Image Generation
Watermarking Autoregressive Image Generation
Nikola Jovanović
Ismail Labiad
Tomáš Souček
Martin Vechev
Pierre Fernandez
WIGM
45
0
0
19 Jun 2025
Factorized RVQ-GAN For Disentangled Speech Tokenization
Factorized RVQ-GAN For Disentangled Speech Tokenization
Sameer Khurana
Dominik Klement
Antoine Laurent
Dominik Bobos
Juraj Novosad
...
Ryo Aihara
Chiori Hori
François Germain
Gordon Wichern
Jonathan Le Roux
24
0
0
18 Jun 2025
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
A Variational Framework for Improving Naturalness in Generative Spoken Language Models
Li-Wei Chen
Takuya Higuchi
Zakaria Aldeneh
Ahmed Hussen Abdelaziz
Alexander I. Rudnicky
38
0
0
17 Jun 2025
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling
StreamMel: Real-Time Zero-shot Text-to-Speech via Interleaved Continuous Autoregressive Modeling
Hui Wang
Yifan Yang
Shujie Liu
Jinyu Li
Lingwei Meng
Y. Liu
Jiaming Zhou
Haoqin Sun
Yan Lu
Yong Qin
40
0
0
14 Jun 2025
ViSAGe: Video-to-Spatial Audio Generation
ViSAGe: Video-to-Spatial Audio Generation
Jaeyeon Kim
Heeseung Yun
Gunhee Kim
VGen
39
2
0
13 Jun 2025
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Scheduled Interleaved Speech-Text Training for Speech-to-Speech Translation with LLMs
Hayato Futami
E. Tsunoo
Yosuke Kashiwagi
Yuki Ito
Hassan Shahmohammadi
Siddhant Arora
Shinji Watanabe
AuLLM
110
0
0
12 Jun 2025
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
OmniDRCA: Parallel Speech-Text Foundation Model via Dual-Resolution Speech Representations and Contrastive Alignment
Chao-Hong Tan
Qian Chen
Wen Wang
Chong Deng
Qinglin Zhang
...
Yukun Ma
Yafeng Chen
Hui Wang
Jiaqing Liu
Jieping Ye
AuLLM
91
0
0
11 Jun 2025
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
UmbraTTS: Adapting Text-to-Speech to Environmental Contexts with Flow Matching
Neta Glazer
Aviv Navon
Yael Segal
Aviv Shamsian
Hilit Segev
Asaf Buchnick
Menachem Pirchi
Gil Hetz
Joseph Keshet
84
0
0
11 Jun 2025
A Review on Score-based Generative Models for Audio Applications
Ge Zhu
Yutong Wen
Zhiyao Duan
DiffMMedIm
43
0
0
10 Jun 2025
Auto-Regressive vs Flow-Matching: a Comparative Study of Modeling Paradigms for Text-to-Music Generation
Or Tal
Felix Kreuk
Yossi Adi
AI4TS
66
0
0
10 Jun 2025
LeVo: High-Quality Song Generation with Multi-Preference Alignment
LeVo: High-Quality Song Generation with Multi-Preference Alignment
Shun Lei
Yaoxun Xu
Zhiwei Lin
Huaicheng Zhang
Wei Tan
...
Chenyu Yang
Haina Zhu
Shuai Wang
Zhiyong Wu
Dong Yu
49
0
0
09 Jun 2025
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Learning Sparsity for Effective and Efficient Music Performance Question Answering
Xingjian Diao
Tianzhen Yang
Chunhui Zhang
Weiyi Wu
Ming Cheng
Jiang Gui
76
1
0
02 Jun 2025
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Rhythm Controllable and Efficient Zero-Shot Voice Conversion via Shortcut Flow Matching
Jialong Zuo
Shengpeng Ji
Minghui Fang
Mingze Li
Ziyue Jiang
Xize Cheng
Xiaoda Yang
Chen Feiyang
Xinyu Duan
Zhou Zhao
50
0
0
01 Jun 2025
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
MagiCodec: Simple Masked Gaussian-Injected Codec for High-Fidelity Reconstruction and Generation
Yakun Song
Jiawei Chen
Xiaobin Zhuang
Chenpeng Du
Ziyang Ma
...
Dongya Jia
Zhuo Chen
Yuping Wang
Yuxuan Wang
Xie Chen
43
0
0
31 May 2025
Learning to Upsample and Upmix Audio in the Latent Domain
Learning to Upsample and Upmix Audio in the Latent Domain
Dimitrios Bralios
Paris Smaragdis
Jonah Casebeer
39
0
0
31 May 2025
Spoken Language Modeling with Duration-Penalized Self-Supervised Units
Spoken Language Modeling with Duration-Penalized Self-Supervised Units
Nicol Visser
Herman Kamper
59
0
0
29 May 2025
Semantics-Aware Human Motion Generation from Audio Instructions
Semantics-Aware Human Motion Generation from Audio Instructions
Zi-An Wang
Shihao Zou
Shiyao Yu
Mingyuan Zhang
Chao Dong
VGen
39
0
0
29 May 2025
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
MGE-LDM: Joint Latent Diffusion for Simultaneous Music Generation and Source Extraction
Yunkee Chae
Kyogu Lee
66
0
0
29 May 2025
A Survey of Generative Categories and Techniques in Multimodal Large Language Models
A Survey of Generative Categories and Techniques in Multimodal Large Language Models
Longzhen Han
Awes Mubarak
Almas Baimagambetov
Nikolaos Polatidis
Thar Baker
LRM
72
0
0
29 May 2025
Text-Queried Audio Source Separation via Hierarchical Modeling
Text-Queried Audio Source Separation via Hierarchical Modeling
Xinlei Yin
Xiulian Peng
Xue Jiang
Zhiwei Xiong
Yan Lu
56
0
0
27 May 2025
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
VoiceStar: Robust Zero-Shot Autoregressive TTS with Duration Control and Extrapolation
Puyuan Peng
Shang-Wen Li
Abdelrahman Mohamed
David Harwath
41
0
0
26 May 2025
SpeakStream: Streaming Text-to-Speech with Interleaved Data
SpeakStream: Streaming Text-to-Speech with Interleaved Data
Richard He Bai
Zijin Gu
Tatiana Likhomanenko
Navdeep Jaitly
AuLLMAI4TS
59
0
0
25 May 2025
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
CCHall: A Novel Benchmark for Joint Cross-Lingual and Cross-Modal Hallucinations Detection in Large Language Models
Yongheng Zhang
Xu Liu
Ruoxi Zhou
Qiguang Chen
Hao Fei
Wenpeng Lu
L. Qin
HILMLRM
41
0
0
25 May 2025
Analyzing Mitigation Strategies for Catastrophic Forgetting in End-to-End Training of Spoken Language Models
Chi-Yuan Hsiao
Ke-Han Lu
Kai-Wei Chang
Chih-Kai Yang
Wei-Chih Chen
Hung-yi Lee
CLLMoMe
204
0
0
23 May 2025
Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate
Unlocking Temporal Flexibility: Neural Speech Codec with Variable Frame Rate
Hanglei Zhang
Yiwei Guo
Zhihan Li
Xiang Hao
Xie Chen
Kai Yu
52
0
0
22 May 2025
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
Audio Jailbreak: An Open Comprehensive Benchmark for Jailbreaking Large Audio-Language Models
Zirui Song
Qian Jiang
Mingxuan Cui
Mingzhe Li
Lang Gao
...
Yanbo Wang
Chenxi Wang
Guangxian Ouyang
Zhenhao Chen
Xiuying Chen
AuLLMAAML
98
0
0
21 May 2025
Large Language Models Implicitly Learn to See and Hear Just By Reading
Prateek Verma
Mert Pilanci
200
0
0
20 May 2025
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
DualCodec: A Low-Frame-Rate, Semantically-Enhanced Neural Audio Codec for Speech Generation
Jiaqi Li
Xiaolong Lin
Zhekai Li
Shixi Huang
Yuancheng Wang
Chaoren Wang
Zhenpeng Zhan
Zhizheng Wu
100
1
0
19 May 2025
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Universal Semantic Disentangled Privacy-preserving Speech Representation Learning
Biel Tura Vecino
Subhadeep Maji
Aravind Varier
Antonio Bonafonte
Ivan Valles
...
Roberto Barra-Chicote
Ariya Rastrow
C. Papayiannis
Volker Leutnant
Trevor Wood
39
0
0
19 May 2025
T2A-Feedback: Improving Basic Capabilities of Text-to-Audio Generation via Fine-grained AI Feedback
Zehan Wang
Ke Lei
Chen Zhu
Jiawei Huang
Sashuai Zhou
...
Xize Cheng
Shengpeng Ji
Zhenhui Ye
Tao Jin
Zhou Zhao
82
0
0
15 May 2025
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
MiniMax-Speech: Intrinsic Zero-Shot Text-to-Speech with a Learnable Speaker Encoder
Bowen Zhang
Congchao Guo
Geng Yang
Hang Yu
Haozhe Zhang
...
Yichen Xiao
Yiying Zhou
Yize Zhang
Yuan Lu
Yucen He
70
1
0
12 May 2025
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
TACOS: Temporally-aligned Audio CaptiOnS for Language-Audio Pretraining
Paul Primus
Florian Schmid
Gerhard Widmer
CLIPAI4TSVLM
60
0
0
12 May 2025
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
ReverbMiipher: Generative Speech Restoration meets Reverberation Characteristics Controllability
Wataru Nakata
Yuma Koizumi
Shigeki Karita
Robin Scheibler
Haruko Ishikawa
Adriana Guevara-Rukoz
Heiga Zen
M. Bacchiani
114
0
0
08 May 2025
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Advancing Zero-shot Text-to-Speech Intelligibility across Diverse Domains via Preference Alignment
Xueyao Zhang
Yijiao Wang
Chaoren Wang
Hui Yuan
Zhuo Chen
Zhizheng Wu
339
0
0
07 May 2025
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
SepALM: Audio Language Models Are Error Correctors for Robust Speech Separation
Zhaoxi Mu
Xinyu Yang
Gang Wang
AuLLMKELMVLM
150
1
0
06 May 2025
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Voila: Voice-Language Foundation Models for Real-Time Autonomous Interaction and Voice Role-Play
Yemin Shi
Yu Shu
Siwei Dong
Guangyi Liu
Jaward Sesay
Jingwen Li
Zhiting Hu
AuLLMVLM
98
0
0
05 May 2025
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Spatial Speech Translation: Translating Across Space With Binaural Hearables
Tuochao Chen
Qirui Wang
Runlin He
Shyam Gollakota
75
0
0
25 Apr 2025
Kimi-Audio Technical Report
Kimi-Audio Technical Report
KimiTeam
Ding Ding
Zeqian Ju
Yichong Leng
Shixuan Liu
...
Zhiyong Yang
Aoxiong Yin
Ruibin Yuan
Yanzhe Zhang
Zaida Zhou
AuLLMVLM
192
13
0
25 Apr 2025
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
SimulS2S-LLM: Unlocking Simultaneous Inference of Speech LLMs for Speech-to-Speech Translation
Keqi Deng
Wenxi Chen
Xie Chen
P. Woodland
122
0
0
22 Apr 2025
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
DRAGON: Distributional Rewards Optimize Diffusion Generative Models
Yatong Bai
Jonah Casebeer
Somayeh Sojoudi
Nicholas J. Bryan
DiffMVLM
115
1
0
21 Apr 2025
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
A Survey on Cross-Modal Interaction Between Music and Multimodal Data
Sifei Li
Mining Tan
Feier Shen
Minyan Luo
Zijiao Yin
Fan Tang
W. Dong
Changsheng Xu
122
1
0
17 Apr 2025
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
ALMTokenizer: A Low-bitrate and Semantic-rich Audio Codec Tokenizer for Audio Language Modeling
Dongchao Yang
Songxiang Liu
Haohan Guo
Jiankun Zhao
Yuanyuan Wang
...
Xubo Liu
Xueyuan Chen
Xu Tan
Xixin Wu
Helen Meng
233
2
0
14 Apr 2025
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
EchoMask: Speech-Queried Attention-based Mask Modeling for Holistic Co-Speech Motion Generation
Xiangyue Zhang
Jianfang Li
Jiaxu Zhang
Jianqiang Ren
Liefeng Bo
Zhigang Tu
89
0
0
12 Apr 2025
Generation of Musical Timbres using a Text-Guided Diffusion Model
Generation of Musical Timbres using a Text-Guided Diffusion Model
Weixuan Yuan
Qadeer Khan
Vladimir Golkov
DiffM
114
0
0
12 Apr 2025
On The Landscape of Spoken Language Models: A Comprehensive Survey
On The Landscape of Spoken Language Models: A Comprehensive Survey
Siddhant Arora
Kai-Wei Chang
Chung-Ming Chien
Yifan Peng
Haibin Wu
Yossi Adi
Emmanuel Dupoux
Hung-yi Lee
Karen Livescu
Shinji Watanabe
161
14
0
11 Apr 2025
AGENT: An Aerial Vehicle Generation and Design Tool Using Large Language Models
AGENT: An Aerial Vehicle Generation and Design Tool Using Large Language Models
Colin Samplawski
Adam Cobb
Susmit Jha
LLMAGAI4CE
110
0
0
11 Apr 2025
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
USM-VC: Mitigating Timbre Leakage with Universal Semantic Mapping Residual Block for Voice Conversion
Na Li
Chuke Wang
Yu Gu
Zhifeng Li
156
0
0
11 Apr 2025
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Seaweed-7B: Cost-Effective Training of Video Generation Foundation Model
Team Seawead
Ceyuan Yang
Zhijie Lin
Yang Zhao
Shanchuan Lin
...
Zuquan Song
Zhenheng Yang
Jiashi Feng
Jianchao Yang
Lu Jiang
DiffM
196
22
0
11 Apr 2025
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
MotionDreamer: One-to-Many Motion Synthesis with Localized Generative Masked Transformer
Yilin Wang
Chuan Guo
Yuxuan Mu
Muhammad Gohar Javed
Wei Ji
Juwei Lu
Hai Jiang
Li Cheng
VGen
65
0
0
11 Apr 2025
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
A Streamable Neural Audio Codec with Residual Scalar-Vector Quantization for Real-Time Communication
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Zhen-Hua Ling
64
0
0
09 Apr 2025
1234...8910
Next