ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation
v1v2 (latest)

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 452 papers shown
Title
Spoken Language Intelligence of Large Language Models for Language Learning
Spoken Language Intelligence of Large Language Models for Language Learning
Linkai Peng
Baorian Nuchged
Yingming Gao
ELM
132
4
0
28 Aug 2023
Sparks of Large Audio Models: A Survey and Outlook
Sparks of Large Audio Models: A Survey and Outlook
S. Latif
Moazzam Shoukat
Fahad Shamshad
Muhammad Usama
Yi Ren
...
Wenwu Wang
Xulong Zhang
Roberto Togneri
Min Zhang
Björn W. Schuller
LM&MAAuLLM
202
39
0
24 Aug 2023
Example-Based Framework for Perceptually Guided Audio Texture Generation
Example-Based Framework for Perceptually Guided Audio Texture Generation
Purnima Kamath
Chitralekha Gupta
L. Wyse
Suranga Nanayakkara
48
4
0
23 Aug 2023
TokenSplit: Using Discrete Speech Representations for Direct, Refined,
  and Transcript-Conditioned Speech Separation and Recognition
TokenSplit: Using Discrete Speech Representations for Direct, Refined, and Transcript-Conditioned Speech Separation and Recognition
Hakan Erdogan
Scott Wisdom
Xuankai Chang
Zalan Borsos
Marco Tagliasacchi
Neil Zeghidour
J. Hershey
77
11
0
21 Aug 2023
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
SpeechX: Neural Codec Language Model as a Versatile Speech Transformer
Xiaofei Wang
Manthan Thakker
Zhuo Chen
Naoyuki Kanda
Sefik Emre Eskimez
Sanyuan Chen
M. Tang
Shujie Liu
Jinyu Li
Takuya Yoshioka
115
86
0
14 Aug 2023
ModelScope Text-to-Video Technical Report
ModelScope Text-to-Video Technical Report
Jiuniu Wang
Hangjie Yuan
Dayou Chen
Yingya Zhang
Xiang Wang
Shiwei Zhang
VGenDiffM
128
431
0
12 Aug 2023
Improving Joint Speech-Text Representations Without Alignment
Improving Joint Speech-Text Representations Without Alignment
Cal Peyser
Zhong Meng
Ke Hu
Rohit Prabhavalkar
Andrew Rosenberg
Tara N. Sainath
M. Picheny
Kyunghyun Cho
VLM
71
4
0
11 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
136
247
0
10 Aug 2023
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech
  Resynthesis
EXPRESSO: A Benchmark and Analysis of Discrete Expressive Speech Resynthesis
Tu Nguyen
Wei-Ning Hsu
Antony DÁvirro
Bowen Shi
Itai Gat
...
Gabriel Synnaeve
Michael Hassid
Felix Kreuk
Yossi Adi
Emmanuel Dupoux
82
62
0
10 Aug 2023
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGenVLMDiffM
167
42
0
09 Aug 2023
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive
  Speech Synthesis with Prosody Conditional Adversarial Training
DiffProsody: Diffusion-based Latent Prosody Generation for Expressive Speech Synthesis with Prosody Conditional Adversarial Training
H. Oh
Sang-Hoon Lee
Seong-Whan Lee
DiffM
106
16
0
31 Jul 2023
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model
  and Language Model: A Comparative Study of Semantic Coding
Minimally-Supervised Speech Synthesis with Conditional Diffusion Model and Language Model: A Comparative Study of Semantic Coding
Chunyu Qiang
Hao Li
Hao Ni
He Qu
Ruibo Fu
Tao Wang
Longbiao Wang
Jianwu Dang
DiffM
78
9
0
28 Jul 2023
Towards Generalist Biomedical AI
Towards Generalist Biomedical AI
Tao Tu
Shekoofeh Azizi
Danny Driess
M. Schaekermann
Mohamed Amin
...
Yossi Matias
K. Singhal
Peter R. Florence
Alan Karthikesalingam
Vivek Natarajan
LM&MAMedImAI4MH
114
277
0
26 Jul 2023
Brain2Music: Reconstructing Music from Human Brain Activity
Brain2Music: Reconstructing Music from Human Brain Activity
Timo I. Denk
Yu Takagi
Takuya Matsuyama
A. Agostinelli
Tomoya Nakai
Christian Frank
Shinji Nishimoto
86
14
0
20 Jul 2023
VampNet: Music Generation via Masked Acoustic Token Modeling
VampNet: Music Generation via Masked Acoustic Token Modeling
Hugo Flores Garcia
Prem Seetharaman
Rithesh Kumar
Bryan Pardo
MGen
93
68
0
10 Jul 2023
ChatGPT in the Age of Generative AI and Large Language Models: A Concise Survey
S. Mohamadi
Ghulam Mujtaba
Ngan Le
Gianfranco Doretto
Don Adjeroh
LM&MAAI4MH
113
21
0
09 Jul 2023
Self-Consuming Generative Models Go MAD
Self-Consuming Generative Models Go MAD
Sina Alemohammad
Josue Casco-Rodriguez
Lorenzo Luzi
Ahmed Imtiaz Humayun
Hossein Babaei
Daniel LeJeune
Ali Siahkoohi
Richard G. Baraniuk
WIGM
123
161
0
04 Jul 2023
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen
  LLMs
SPAE: Semantic Pyramid AutoEncoder for Multimodal Generation with Frozen LLMs
Lijun Yu
Yong Cheng
Zhiruo Wang
Vivek Kumar
Wolfgang Macherey
...
Yonatan Bisk
Ming-Hsuan Yang
Kevin Patrick Murphy
Alexander G. Hauptmann
Lu Jiang
MLLM
102
52
0
30 Jun 2023
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Voicebox: Text-Guided Multilingual Universal Speech Generation at Scale
Matt Le
Apoorv Vyas
Bowen Shi
Brian Karrer
Leda Sari
...
Mary Williamson
Vimal Manohar
Yossi Adi
Jay Mahadeokar
Wei-Ning Hsu
AuLLM
134
306
0
23 Jun 2023
Impacts and Risk of Generative AI Technology on Cyber Defense
Impacts and Risk of Generative AI Technology on Cyber Defense
Subash Neupane
Ivan A. Fernandez
Sudip Mittal
Shahram Rahimi
100
18
0
22 Jun 2023
AudioPaLM: A Large Language Model That Can Speak and Listen
AudioPaLM: A Large Language Model That Can Speak and Listen
Paul Kishan Rubenstein
Chulayuth Asawaroengchai
D. Nguyen
Ankur Bapna
Zalan Borsos
...
Neil Zeghidour
Yu Zhang
Zhishuai Zhang
Lukás Zilka
Christian Frank
LM&MAAuLLMVLM
141
295
0
22 Jun 2023
LM-VC: Zero-shot Voice Conversion via Speech Generation based on
  Language Models
LM-VC: Zero-shot Voice Conversion via Speech Generation based on Language Models
Zhichao Wang
Yuan-Jui Chen
Linfu Xie
Qiao Tian
Yuping Wang
160
32
0
18 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
116
10
0
17 Jun 2023
FALL-E: A Foley Sound Synthesis Model and Strategies
FALL-E: A Foley Sound Synthesis Model and Strategies
Minsung Kang
Sangshin Oh
Hyeongi Moon
Kyungyun Lee
Ben Sangbae Chon
59
4
0
16 Jun 2023
UniCATS: A Unified Context-Aware Text-to-Speech Framework with
  Contextual VQ-Diffusion and Vocoding
UniCATS: A Unified Context-Aware Text-to-Speech Framework with Contextual VQ-Diffusion and Vocoding
Chenpeng Du
Yiwei Guo
Feiyu Shen
Zhijun Liu
Zheng Liang
Xie Chen
Shuai Wang
Hui Zhang
K. Yu
DiffM
106
44
0
13 Jun 2023
High-Fidelity Audio Compression with Improved RVQGAN
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
129
338
0
11 Jun 2023
Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for
  Speech Understanding
Speech-to-Text Adapter and Speech-to-Entity Retriever Augmented LLMs for Speech Understanding
Mingqiu Wang
Izhak Shafran
H. Soltau
Wei Han
Yuan Cao
Dian Yu
Laurent El Shafey
RALMAuLLM
98
9
0
08 Jun 2023
Privately generating tabular data using language models
Privately generating tabular data using language models
Alexandre Sablayrolles
Yue Wang
Brian Karrer
LMTD
83
5
0
07 Jun 2023
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive
  Bias
Mega-TTS: Zero-Shot Text-to-Speech at Scale with Intrinsic Inductive Bias
Ziyue Jiang
Yi Ren
Zhe Ye
Jinglin Liu
Chen Zhang
...
Rongjie Huang
Chunfeng Wang
Xiang Yin
Zejun Ma
Zhou Zhao
DiffM
105
80
0
06 Jun 2023
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
116
15
0
05 Jun 2023
PolyVoice: Language Models for Speech to Speech Translation
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong
Zhiying Huang
Qiao Tian
Chen Xu
Tom Ko
...
Lu Lu
Zejun Ma
Yuping Wang
Mingxuan Wang
Yuxuan Wang
109
25
0
05 Jun 2023
A survey of Generative AI Applications
A survey of Generative AI Applications
Roberto Gozalo-Brizuela
Eduardo C. Garrido-Merchán
3DVMedIm
108
91
0
05 Jun 2023
SpeechGen: Unlocking the Generative Power of Speech Language Models with
  Prompts
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
Haibin Wu
Kai-Wei Chang
Yuan-Kuei Wu
Hung-yi Lee
128
23
0
03 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
150
104
0
01 Jun 2023
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion
  Model
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
A. Iashchenko
Pavel Andreev
Ivan Shchekotov
Nicholas Babaev
Dmitry Vetrov
DiffM
90
2
0
01 Jun 2023
How Generative Spoken Language Modeling Encodes Noisy Speech:
  Investigation from Phonetics to Syntactics
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park
Shinnosuke Takamichi
Tomohiko Nakamura
Kentaro Seki
Detai Xin
Hiroshi Saruwatari
AuLLM
39
3
0
01 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale
  Self-supervised Training
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
125
130
0
31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code
  Collaborated with Mixer
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
77
0
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
88
27
0
30 May 2023
Domain Specialization as the Key to Make Large Language Models
  Disruptive: A Comprehensive Survey
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Liang Zhao
ALM
169
140
0
30 May 2023
Disentanglement via Latent Quantization
Disentanglement via Latent Quantization
Kyle Hsu
W. Dorrell
James C. R. Whittington
Jiajun Wu
Chelsea Finn
DRL
163
27
0
28 May 2023
Efficient Neural Music Generation
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffMMGen
95
56
0
25 May 2023
Spoken Question Answering and Speech Continuation Using
  Spectrogram-Powered LLM
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
118
45
0
24 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLMSyDa
133
61
0
22 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLMAuLLM
109
182
0
19 May 2023
Graphologue: Exploring Large Language Model Responses with Interactive
  Diagrams
Graphologue: Exploring Large Language Model Responses with Interactive Diagrams
Peiling Jiang
Jude Rayan
Steven W. Dow
Haijun Xia
95
112
0
19 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLMMLLM
136
344
0
18 May 2023
SoundStorm: Efficient Parallel Audio Generation
SoundStorm: Efficient Parallel Audio Generation
Zalan Borsos
Matthew Sharifi
Damien Vincent
Eugene Kharitonov
Neil Zeghidour
Marco Tagliasacchi
103
110
0
16 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
65
14
0
15 May 2023
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
Kun Su
Judith Yue Li
Qingqing Huang
Dima Kuzmin
Joonseok Lee
...
Fei Sha
A. Jansen
Yu Wang
Mauro Verzetti
Timo I. Denk
VGen
86
14
0
11 May 2023
Previous
123...10789
Next