ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXivPDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 428 papers shown
Title
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
LipVoicer: Generating Speech from Silent Videos Guided by Lip Reading
Yochai Yemini
Aviv Shamsian
Lior Bracha
Sharon Gannot
Ethan Fetaya
DiffM
30
11
0
05 Jun 2023
PolyVoice: Language Models for Speech to Speech Translation
PolyVoice: Language Models for Speech to Speech Translation
Qianqian Dong
Zhiying Huang
Qiao Tian
Chen Xu
Tom Ko
...
Lu Lu
Zejun Ma
Yuping Wang
Mingxuan Wang
Yuxuan Wang
28
23
0
05 Jun 2023
A survey of Generative AI Applications
A survey of Generative AI Applications
Roberto Gozalo-Brizuela
Eduardo C. Garrido-Merchán
3DV
MedIm
30
80
0
05 Jun 2023
SpeechGen: Unlocking the Generative Power of Speech Language Models with
  Prompts
SpeechGen: Unlocking the Generative Power of Speech Language Models with Prompts
Haibin Wu
Kai-Wei Chang
Yuan-Kuei Wu
Hung-yi Lee
33
22
0
03 Jun 2023
Vocos: Closing the gap between time-domain and Fourier-based neural
  vocoders for high-quality audio synthesis
Vocos: Closing the gap between time-domain and Fourier-based neural vocoders for high-quality audio synthesis
Hubert Siuzdak
32
79
0
01 Jun 2023
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion
  Model
UnDiff: Unsupervised Voice Restoration with Unconditional Diffusion Model
A. Iashchenko
Pavel Andreev
Ivan Shchekotov
Nicholas Babaev
Dmitry Vetrov
DiffM
21
1
0
01 Jun 2023
How Generative Spoken Language Modeling Encodes Noisy Speech:
  Investigation from Phonetics to Syntactics
How Generative Spoken Language Modeling Encodes Noisy Speech: Investigation from Phonetics to Syntactics
Joonyong Park
Shinnosuke Takamichi
Tomohiko Nakamura
Kentaro Seki
Detai Xin
Hiroshi Saruwatari
AuLLM
11
3
0
01 Jun 2023
MERT: Acoustic Music Understanding Model with Large-Scale
  Self-supervised Training
MERT: Acoustic Music Understanding Model with Large-Scale Self-supervised Training
Yizhi Li
Ruibin Yuan
Ge Zhang
Yi Ma
Xingran Chen
...
Yemin Shi
Wen-Fen Huang
Zili Wang
Yi-Ting Guo
Jie Fu
30
108
0
31 May 2023
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code
  Collaborated with Mixer
DC CoMix TTS: An End-to-End Expressive TTS with Discrete Code Collaborated with Mixer
Yerin Choi
M. Koo
33
0
0
31 May 2023
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Make-A-Voice: Unified Voice Synthesis With Discrete Representation
Rongjie Huang
Chunlei Zhang
Yongqiang Wang
Dongchao Yang
Lu Liu
Zhenhui Ye
Ziyue Jiang
Chao Weng
Zhou Zhao
Dong Yu
DiffM
31
26
0
30 May 2023
Domain Specialization as the Key to Make Large Language Models
  Disruptive: A Comprehensive Survey
Domain Specialization as the Key to Make Large Language Models Disruptive: A Comprehensive Survey
Chen Ling
Xujiang Zhao
Jiaying Lu
Chengyuan Deng
Can Zheng
...
Chris White
Quanquan Gu
Jian Pei
Carl Yang
Liang Zhao
ALM
33
126
0
30 May 2023
Disentanglement via Latent Quantization
Disentanglement via Latent Quantization
Kyle Hsu
W. Dorrell
James C. R. Whittington
Jiajun Wu
Chelsea Finn
DRL
29
25
0
28 May 2023
Efficient Neural Music Generation
Efficient Neural Music Generation
Max W. Y. Lam
Qiao Tian
Tang-Chun Li
Zongyu Yin
Siyuan Feng
...
Mingbo Ma
Xuchen Song
Jitong Chen
Yuping Wang
Yuxuan Wang
DiffM
MGen
34
49
0
25 May 2023
Spoken Question Answering and Speech Continuation Using
  Spectrogram-Powered LLM
Spoken Question Answering and Speech Continuation Using Spectrogram-Powered LLM
Eliya Nachmani
Alon Levkovitch
Roy Hirsch
Julián Salazar
Chulayutsh Asawaroengchai
Soroosh Mariooryad
Ehud Rivlin
RJ Skerry-Ryan
Michelle Tadmor Ramanovich
AuLLM
31
31
0
24 May 2023
Textually Pretrained Speech Language Models
Textually Pretrained Speech Language Models
Michael Hassid
Tal Remez
Tu Nguyen
Itai Gat
Alexis Conneau
...
Alexandre Défossez
Gabriel Synnaeve
Emmanuel Dupoux
Roy Schwartz
Yossi Adi
VLM
SyDa
44
53
0
22 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
39
159
0
19 May 2023
Graphologue: Exploring Large Language Model Responses with Interactive
  Diagrams
Graphologue: Exploring Large Language Model Responses with Interactive Diagrams
Peiling Jiang
Jude Rayan
Steven W. Dow
Haijun Xia
32
100
0
19 May 2023
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal
  Conversational Abilities
SpeechGPT: Empowering Large Language Models with Intrinsic Cross-Modal Conversational Abilities
Dong Zhang
Shimin Li
Xin Zhang
Jun Zhan
Pengyu Wang
Yaqian Zhou
Xipeng Qiu
AuLLM
MLLM
62
297
0
18 May 2023
SoundStorm: Efficient Parallel Audio Generation
SoundStorm: Efficient Parallel Audio Generation
Zalan Borsos
Matthew Sharifi
Damien Vincent
Eugene Kharitonov
Neil Zeghidour
Marco Tagliasacchi
28
98
0
16 May 2023
Back Translation for Speech-to-text Translation Without Transcripts
Back Translation for Speech-to-text Translation Without Transcripts
Qingkai Fang
Yang Feng
38
13
0
15 May 2023
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
V2Meow: Meowing to the Visual Beat via Video-to-Music Generation
Kun Su
Judith Yue Li
Qingqing Huang
Dima Kuzmin
Joonseok Lee
...
Fei Sha
A. Jansen
Yu Wang
Mauro Verzetti
Timo I. Denk
VGen
39
12
0
11 May 2023
AudioSlots: A slot-centric generative model for audio separation
AudioSlots: A slot-centric generative model for audio separation
P. Reddy
Scott Wisdom
Klaus Greff
J. Hershey
Thomas Kipf
OCL
VLM
28
6
0
09 May 2023
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio
  Codec
HiFi-Codec: Group-residual Vector quantization for High Fidelity Audio Codec
Dongchao Yang
Songxiang Liu
Rongjie Huang
Jinchuan Tian
Chao Weng
Yuexian Zou
150
121
0
04 May 2023
Shap-E: Generating Conditional 3D Implicit Functions
Shap-E: Generating Conditional 3D Implicit Functions
Heewoo Jun
Alex Nichol
DiffM
203
310
0
03 May 2023
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking
  Head
AudioGPT: Understanding and Generating Speech, Music, Sound, and Talking Head
Rongjie Huang
Mingze Li
Dongchao Yang
Jiatong Shi
Xuankai Chang
...
Jia-Bin Huang
Jinglin Liu
Yixiang Ren
Zhou Zhao
Shinji Watanabe
LM&MA
AuLLM
45
198
0
25 Apr 2023
Foley Sound Synthesis at the DCASE 2023 Challenge
Foley Sound Synthesis at the DCASE 2023 Challenge
Keunwoo Choi
Jae-Yeol Im
Laurie M. Heller
Brian McFee
Keisuke Imoto
Yuki Okamoto
Mathieu Lagrange
Shinosuke Takamichi
18
30
0
25 Apr 2023
DiffVoice: Text-to-Speech with Latent Diffusion
DiffVoice: Text-to-Speech with Latent Diffusion
Zhijun Liu
Yiwei Guo
K. Yu
DiffM
27
22
0
23 Apr 2023
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot
  Speech and Singing Synthesizers
NaturalSpeech 2: Latent Diffusion Models are Natural and Zero-Shot Speech and Singing Synthesizers
Kai Shen
Zeqian Ju
Xu Tan
Yanqing Liu
Yichong Leng
Lei He
Tao Qin
Sheng Zhao
Jiang Bian
DiffM
44
225
0
18 Apr 2023
Scientists' Perspectives on the Potential for Generative AI in their
  Fields
Scientists' Perspectives on the Potential for Generative AI in their Fields
Meredith Ringel Morris
AI4CE
33
38
0
04 Apr 2023
LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
LMCodec: A Low Bitrate Speech Codec With Causal Transformer Models
Teerapat Jenrungrot
Michael Chinen
W. Kleijn
Jan Skoglund
Zalan Borsos
Neil Zeghidour
Marco Tagliasacchi
60
19
0
23 Mar 2023
Large AI Models in Health Informatics: Applications, Challenges, and the
  Future
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MH
LM&MA
42
127
0
21 Mar 2023
ICASSP 2023 Deep Noise Suppression Challenge
ICASSP 2023 Deep Noise Suppression Challenge
Harishchandra Dubey
A. Aazami
Vishak Gopal
Sergiy Matusevych
Sebastian Braun
...
Sefik Emre Eskimez
Manthan Thakker
H. Gamper
Takuya Yoshioka
R. Aichner
28
83
0
21 Mar 2023
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec
  Language Modeling
Speak Foreign Languages with Your Own Voice: Cross-Lingual Neural Codec Language Modeling
Zi-Hua Zhang
Long Zhou
Chengyi Wang
Sanyuan Chen
Yu Wu
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
VLM
36
171
0
07 Mar 2023
FoundationTTS: Text-to-Speech for ASR Customization with Generative
  Language Model
FoundationTTS: Text-to-Speech for ASR Customization with Generative Language Model
Rui Xue
Yanqing Liu
Lei He
Xuejiao Tan
Linquan Liu
Ed Lin
Sheng Zhao
34
7
0
06 Mar 2023
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for
  Whisper-based Speech Interactions
WESPER: Zero-shot and Realtime Whisper to Normal Voice Conversion for Whisper-based Speech Interactions
Jun Rekimoto
46
19
0
03 Mar 2023
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised
  representations
ParrotTTS: Text-to-Speech synthesis by exploiting self-supervised representations
N. Shah
Saiteja Kosgi
Vishal Tambrahalli
Neha Sahipjohn
Anil Nelakanti
Vineet Gandhi
25
8
0
01 Mar 2023
Self-Organising Neural Discrete Representation Learning à la Kohonen
Self-Organising Neural Discrete Representation Learning à la Kohonen
Kazuki Irie
Róbert Csordás
Jürgen Schmidhuber
SSL
32
1
0
15 Feb 2023
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Noise2Music: Text-conditioned Music Generation with Diffusion Models
Qingqing Huang
Daniel S. Park
Tao Wang
Timo I. Denk
Andy Ly
...
Jesse Engel
Quoc V. Le
William Chan
Zhifeng Chen
Wei Han
MGen
DiffM
38
191
0
08 Feb 2023
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal
  Supervision
Speak, Read and Prompt: High-Fidelity Text-to-Speech with Minimal Supervision
Eugene Kharitonov
Damien Vincent
Zalan Borsos
Raphaël Marinier
Sertan Girgin
Olivier Pietquin
Matthew Sharifi
Marco Tagliasacchi
Neil Zeghidour
18
190
0
07 Feb 2023
Multi-Source Diffusion Models for Simultaneous Music Generation and
  Separation
Multi-Source Diffusion Models for Simultaneous Music Generation and Separation
Giorgio Mariani
Irene Tallini
Emilian Postolache
Michele Mancusi
Luca Cosmo
Emanuele Rodolà
DiffM
30
37
0
04 Feb 2023
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with
  Natural Language Style Prompt
InstructTTS: Modelling Expressive TTS in Discrete Latent Space with Natural Language Style Prompt
Dongchao Yang
Songxiang Liu
Rongjie Huang
Chao Weng
Helen Meng
DiffM
VLM
31
85
0
31 Jan 2023
ArchiSound: Audio Generation with Diffusion
ArchiSound: Audio Generation with Diffusion
Flavio Schneider
21
23
0
30 Jan 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and
  Toxicity
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
36
102
0
30 Jan 2023
SingSong: Generating musical accompaniments from singing
SingSong: Generating musical accompaniments from singing
Chris Donahue
Antoine Caillon
Adam Roberts
Ethan Manilow
P. Esling
...
Mauro Verzetti
Ian Simon
Olivier Pietquin
Neil Zeghidour
Jesse Engel
37
52
0
30 Jan 2023
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion
  Models
Make-An-Audio: Text-To-Audio Generation with Prompt-Enhanced Diffusion Models
Rongjie Huang
Jia-Bin Huang
Dongchao Yang
Yi Ren
Luping Liu
Mingze Li
Zhenhui Ye
Jinglin Liu
Xiaoyue Yin
Zhou Zhao
DiffM
151
317
0
30 Jan 2023
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Flavio Schneider
Ojasv Kamal
Zhijing Jin
Bernhard Schölkopf
MGen
38
83
0
27 Jan 2023
MusicLM: Generating Music From Text
MusicLM: Generating Music From Text
A. Agostinelli
Timo I. Denk
Zalan Borsos
Jesse Engel
Mauro Verzetti
...
Adam Roberts
Marco Tagliasacchi
Matthew Sharifi
Neil Zeghidour
Christian Frank
MGen
55
417
0
26 Jan 2023
Regeneration Learning: A Learning Paradigm for Data Generation
Regeneration Learning: A Learning Paradigm for Data Generation
Xu Tan
Tao Qin
Jiang Bian
Tie-Yan Liu
Yoshua Bengio
GAN
38
15
0
21 Jan 2023
ChatGPT is not all you need. A State of the Art Review of large
  Generative AI models
ChatGPT is not all you need. A State of the Art Review of large Generative AI models
Roberto Gozalo-Brizuela
E.C. Garrido-Merchán
27
261
0
11 Jan 2023
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Neural Codec Language Models are Zero-Shot Text to Speech Synthesizers
Chengyi Wang
Sanyuan Chen
Yu-Huan Wu
Zi-Hua Zhang
Long Zhou
...
Huaming Wang
Jinyu Li
Lei He
Sheng Zhao
Furu Wei
48
644
0
05 Jan 2023
Previous
123456789
Next