ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation
v1v2 (latest)

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 452 papers shown
Title
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
VoiceCraft-Dub: Automated Video Dubbing with Neural Codec Language Models
Kim Sung-Bin
Jeongsoo Choi
Puyuan Peng
Joon Son Chung
Tae-Hyun Oh
David Harwath
VGen
78
1
0
03 Apr 2025
Scaling Analysis of Interleaved Speech-Text Language Models
Scaling Analysis of Interleaved Speech-Text Language Models
Gallil Maimon
Michael Hassid
Amit Roth
Yossi Adi
AuLLM
131
1
0
03 Apr 2025
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
A Survey on Music Generation from Single-Modal, Cross-Modal, and Multi-Modal Perspectives
Shuyu Li
Shulei Ji
Zihao Wang
Songruoyao Wu
Jiaxing Yu
Jianchao Tan
MGenVGen
297
1
0
01 Apr 2025
UniSep: Universal Target Audio Separation with Language Models at Scale
UniSep: Universal Target Audio Separation with Language Models at Scale
Yun Wang
Hangting Chen
Dongchao Yang
Weiqin Li
Dan Luo
Guangzhi Li
Shan Yang
Zhiyong Wu
Helen Meng
Xixin Wu
VLM
84
1
0
31 Mar 2025
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Make Some Noise: Towards LLM audio reasoning and generation using sound tokens
Shivam Mehta
Nebojsa Jojic
Hannes Gamper
76
0
0
28 Mar 2025
Vision-to-Music Generation: A Survey
Vision-to-Music Generation: A Survey
Zhaokai Wang
Chenxi Bao
Le Zhuo
Jingrui Han
Yang Yue
Yihong Tang
Victor Shea-Jay Huang
Yue Liao
EGVMVGen
141
1
0
27 Mar 2025
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
STFTCodec: High-Fidelity Audio Compression through Time-Frequency Domain Representation
Tao Feng
Zhiyuan Zhao
Yifan Xie
Yuqi Ye
Xiangyang Luo
Xun Guan
Yongqian Li
132
0
0
21 Mar 2025
QINCODEC: Neural Audio Compression with Implicit Neural Codebooks
QINCODEC: Neural Audio Compression with Implicit Neural Codebooks
Zineb Lahrichi
Gaëtan Hadjeres
Gaël Richard
Geoffroy Peeters
111
0
0
19 Mar 2025
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Universal Speech Token Learning via Low-Bitrate Neural Codec and Pretrained Representations
Xue Jiang
Xiulian Peng
Yuan Zhang
Yan Lu
SSL
146
1
0
15 Mar 2025
From TOWER to SPIRE: Adding the Speech Modality to a Text-Only LLM
Kshitij Ambilduke
Ben Peters
Sonal Sannigrahi
Anil Keshwani
Tsz Kin Lam
Bruno Martins
Marcely Zanon Boito
André F. T. Martins
114
2
0
13 Mar 2025
An Exhaustive Evaluation of TTS- and VC-based Data Augmentation for ASR
Sewade Ogun
Vincent Colotte
Emmanuel Vincent
110
0
0
11 Mar 2025
Text-Speech Language Models with Improved Cross-Modal Transfer by Aligning Abstraction Levels
Santiago Cuervo
Adel Moumen
Yanis Labrak
Sameer Khurana
Antoine Laurent
Mickael Rouvier
R. Marxer
138
1
0
08 Mar 2025
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
InSerter: Speech Instruction Following with Unsupervised Interleaved Pre-training
Dingdong Wang
Jin Xu
Ruihang Chu
Zhifang Guo
Xinyu Wang
Jincenzi Wu
Dongchao Yang
Shengpeng Ji
Junyang Lin
AuLLM
153
2
0
04 Mar 2025
PodAgent: A Comprehensive Framework for Podcast Generation
Yujia Xiao
Lei He
Haohan Guo
Fenglong Xie
Tan Lee
433
1
0
01 Mar 2025
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
DualSpec: Text-to-spatial-audio Generation via Dual-Spectrogram Guided Diffusion Model
Lei Zhao
Sizhou Chen
Linfeng Feng
Ju Liu
Xuelong Li
Fangqiu Yi
Xuelong Li
DiffMMDE
112
1
0
26 Feb 2025
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Mind the Gap! Static and Interactive Evaluations of Large Audio Models
Minzhi Li
William B. Held
Michael Joseph Ryan
Kunat Pipatanakul
Potsawee Manakul
Hao Zhu
Diyi Yang
AuLLMALM
95
2
0
21 Feb 2025
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
FlexDuo: A Pluggable System for Enabling Full-Duplex Capabilities in Speech Dialogue Systems
Borui Liao
Yulong Xu
Jiao Ou
Kaiyuan Yang
Weihua Jian
Pengfei Wan
Di Zhang
AuLLM
149
0
0
19 Feb 2025
Slamming: Training a Speech Language Model on One GPU in a Day
Slamming: Training a Speech Language Model on One GPU in a Day
Gallil Maimon
Avishai Elmakies
Yossi Adi
97
3
0
19 Feb 2025
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
FELLE: Autoregressive Speech Synthesis with Token-Wise Coarse-to-Fine Flow Matching
Hui Wang
Shujie Liu
Lingwei Meng
Jiajian Li
Yifan Yang
...
Yanqing Liu
Haoqin Sun
Jiaming Zhou
Yan Lu
Yong Qin
112
2
0
16 Feb 2025
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Koel-TTS: Enhancing LLM based Speech Generation with Preference Alignment and Classifier Free Guidance
Shehzeen Samarah Hussain
Paarth Neekhara
Xuesong Yang
Edresson Casanova
Subhankar Ghosh
Mikyas T. Desta
Roy Fejgin
Rafael Valle
Jason Chun Lok Li
151
5
0
07 Feb 2025
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
AudioMiXR: Spatial Audio Object Manipulation with 6DoF for Sound Design in Augmented Reality
Brandon Woodard
Margarita Geleta
Joseph J. LaViola Jr.
Andrea Fanelli
Rhonda Wilson
176
4
0
05 Feb 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
490
1
0
05 Feb 2025
BRIDLE: Generalized Self-supervised Learning with Quantization
BRIDLE: Generalized Self-supervised Learning with Quantization
Hoang M. Nguyen
Satya Narayan Shukla
Qiang Zhang
Hanchao Yu
Sreya D. Roy
Taipeng Tian
Lingjiong Zhu
Yuchen Liu
SSLMQ
144
0
0
04 Feb 2025
ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling
ComplexDec: A Domain-robust High-fidelity Neural Audio Codec with Complex Spectrum Modeling
Yi-Chiao Wu
Dejan Marković
Steven Krenn
I. D. Gebru
Alexander Richard
84
1
0
04 Feb 2025
Audio-Language Models for Audio-Centric Tasks: A survey
Yi Su
Jisheng Bai
Qisheng Xu
Kele Xu
Yong Dou
AuLLM
172
4
0
28 Jan 2025
Discrete Speech Unit Extraction via Independent Component Analysis
Discrete Speech Unit Extraction via Independent Component Analysis
Tomohiko Nakamura
Kwanghee Choi
Keigo Hojo
Yoshiaki Bando
Satoru Fukayama
Shinji Watanabe
83
1
0
11 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
120
1
0
10 Jan 2025
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Contextual ASR Error Handling with LLMs Augmentation for Goal-Oriented Conversational AI
Yuya Asano
Sabit Hassan
P. Sharma
Anthony Sicilia
Katherine Atwell
Diane Litman
Malihe Alikhani
116
1
0
10 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
193
3
0
10 Jan 2025
Spatial Information Integration in Small Language Models for Document Layout Generation and Classification
Spatial Information Integration in Small Language Models for Document Layout Generation and Classification
Pablo Melendez
Clemens Havas
74
0
0
09 Jan 2025
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
ZSVC: Zero-shot Style Voice Conversion with Disentangled Latent Diffusion Models and Adversarial Training
Xinfa Zhu
Lei He
Yujia Xiao
Xi Wang
Xu Tan
Sheng Zhao
Lei Xie
DiffM
102
2
0
08 Jan 2025
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
Unsupervised Speech Segmentation: A General Approach Using Speech Language Models
Avishai Elmakies
Omri Abend
Yossi Adi
132
1
0
08 Jan 2025
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
SongEditor: Adapting Zero-Shot Song Generation Language Model as a Multi-Task Editor
Chenyu Yang
Shuai Wang
Hangting Chen
Jianwei Yu
Wei Tan
Rongzhi Gu
Yongjun Xu
Yizhi Zhou
Haina Zhu
Haoyang Li
KELM
425
2
0
18 Dec 2024
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction
  with 3D Autonomous Characters
SOLAMI: Social Vision-Language-Action Modeling for Immersive Interaction with 3D Autonomous Characters
Jianping Jiang
Weiye Xiao
Zhengyu Lin
Han Zhang
Tianxiang Ren
Yang Gao
Zhiqian Lin
Zhongang Cai
Lei Yang
Ziwei Liu
152
3
0
29 Nov 2024
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Scaling Transformers for Low-Bitrate High-Quality Speech Coding
Julian Parker
Anton Smirnov
Jordi Pons
CJ Carr
Zack Zukowski
Zach Evans
Xubo Liu
138
16
0
29 Nov 2024
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge
Tiny-Align: Bridging Automatic Speech Recognition and Large Language Model on the Edge
Ruiyang Qin
Dancheng Liu
Gelei Xu
Zheyu Yan
Chenhui Xu
Yuting Hu
Xiaolin Hu
Jinjun Xiong
Yiyu Shi
Y. Shi
AuLLM
176
1
0
21 Nov 2024
ESTVocoder: An Excitation-Spectral-Transformed Neural Vocoder Conditioned on Mel Spectrogram
Xiao-Hang Jiang
Hui-Peng Du
Yang Ai
Ye-Xin Lu
Zhen-Hua Ling
88
0
0
18 Nov 2024
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for
  Speech Recognition
Mamba-based Decoder-Only Approach with Bidirectional Speech Modeling for Speech Recognition
Yoshiki Masuyama
Koichi Miyazaki
Masato Murata
Mamba
85
0
0
11 Nov 2024
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks
Felipe Marra
Lucas N. Ferreira
111
0
0
06 Nov 2024
Addressing Representation Collapse in Vector Quantized Models with One
  Linear Layer
Addressing Representation Collapse in Vector Quantized Models with One Linear Layer
Yongxin Zhu
Bing Li
Yifei Xin
Linli Xu
115
13
0
04 Nov 2024
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Align-SLM: Textless Spoken Language Models with Reinforcement Learning from AI Feedback
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Aditya Gourav
Yile Gu
Ankur Gandhe
Hung-yi Lee
I. Bulyko
124
9
0
04 Nov 2024
Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations
Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations
Quoc-Huy Trinh
Minh-Van Nguyen
Trong-Hieu Nguyen-Mau
Khoa Tran
Thanh Do
61
0
0
03 Nov 2024
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High
  Sampling Rate and Low Bitrate Scenarios
MDCTCodec: A Lightweight MDCT-based Neural Audio Codec towards High Sampling Rate and Low Bitrate Scenarios
Xiao-Hang Jiang
Yang Ai
Rui Zheng
Hui-Peng Du
Ye-Xin Lu
Zhen-Hua Ling
103
3
0
01 Nov 2024
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
DC-Spin: A Speaker-invariant Speech Tokenizer for Spoken Language Models
Heng-Jui Chang
Hongyu Gong
Changhan Wang
James R. Glass
Yu-An Chung
122
0
0
31 Oct 2024
An Empirical Analysis of Speech Self-Supervised Learning at Multiple
  Resolutions
An Empirical Analysis of Speech Self-Supervised Learning at Multiple Resolutions
Theo Clark
Benedetta Cevoli
Eloy de Jong
Timofey Abramski
Jamie Dougherty
SSL
73
0
0
31 Oct 2024
APCodec+: A Spectrum-Coding-Based High-Fidelity and
  High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
APCodec+: A Spectrum-Coding-Based High-Fidelity and High-Compression-Rate Neural Audio Codec with Staged Training Paradigm
Hui-Peng Du
Yang Ai
Rui Zheng
Zhen-Hua Ling
65
2
0
30 Oct 2024
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between
  Codec and Waveform Generation
A Closer Look at Neural Codec Resynthesis: Bridging the Gap between Codec and Waveform Generation
Alexander H. Liu
Qirui Wang
Yuan Gong
James Glass
66
0
0
29 Oct 2024
Survey of User Interface Design and Interaction Techniques in Generative
  AI Applications
Survey of User Interface Design and Interaction Techniques in Generative AI Applications
Reuben Luera
Ryan Rossi
Alexa F. Siu
Franck Dernoncourt
Tong Yu
...
Hanieh Salehy
Jian Zhao
Samyadeep Basu
Puneet Mathur
Nedim Lipka
AI4TS
139
1
0
28 Oct 2024
Get Large Language Models Ready to Speak: A Late-fusion Approach for
  Speech Generation
Get Large Language Models Ready to Speak: A Late-fusion Approach for Speech Generation
Maohao Shen
Shun Zhang
Jilong Wu
Zhiping Xiu
Ehab AlBadawy
Yiting Lu
M. Seltzer
Qing He
70
2
0
27 Oct 2024
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot
  Keyword Spotting
GE2E-KWS: Generalized End-to-End Training and Evaluation for Zero-shot Keyword Spotting
Pai Zhu
Jacob Bartel
Dhruuv Agarwal
Kurt Partridge
Hyun-jin Park
Quan Wang
53
2
0
22 Oct 2024
Previous
12345...8910
Next