ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation
v1v2 (latest)

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 452 papers shown
Title
Learning Fine-Grained Controllability on Speech Generation via Efficient
  Fine-Tuning
Learning Fine-Grained Controllability on Speech Generation via Efficient Fine-Tuning
Chung-Ming Chien
Andros Tjandra
Apoorv Vyas
Matt Le
Bowen Shi
Wei-Ning Hsu
79
0
0
10 Jun 2024
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody
  Modeling
Towards Expressive Zero-Shot Speech Synthesis with Hierarchical Prosody Modeling
Yuepeng Jiang
Tao Li
Fengyu Yang
Lei Xie
Meng Meng
Yujun Wang
75
2
0
09 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
93
20
0
08 Jun 2024
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Exploring the Benefits of Tokenization of Discrete Acoustic Units
Avihu Dekel
Raul Fernandez
85
2
0
08 Jun 2024
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text
  to Speech Synthesizers
VALL-E 2: Neural Codec Language Models are Human Parity Zero-Shot Text to Speech Synthesizers
Sanyuan Chen
Shujie Liu
Long Zhou
Yanqing Liu
Xu Tan
Jinyu Li
Sheng Zhao
Yao Qian
Furu Wei
VLM
121
83
0
08 Jun 2024
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis
  with Context-Aware Contrastive Language-Audio Pretraining
Retrieval Augmented Generation in Prompt-based Text-to-Speech Synthesis with Context-Aware Contrastive Language-Audio Pretraining
Jinlong Xue
Yayue Deng
Yingming Gao
Ya Li
RALMVLM
136
7
0
06 Jun 2024
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with
  Multi-Modal Context and Large Language Model
Improving Audio Codec-based Zero-Shot Text-to-Speech Synthesis with Multi-Modal Context and Large Language Model
Jinlong Xue
Yayue Deng
Yicheng Han
Yingming Gao
Ya Li
95
4
0
06 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
145
6
0
06 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
95
6
0
05 Jun 2024
Discrete Multimodal Transformers with a Pretrained Large Language Model
  for Mixed-Supervision Speech Processing
Discrete Multimodal Transformers with a Pretrained Large Language Model for Mixed-Supervision Speech Processing
V. Trinh
Rosy Southwell
Yiwen Guan
Xinlu He
Zhiyong Wang
Jacob Whitehill
OffRL
93
2
0
04 Jun 2024
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar
  Latent Transformer Diffusion Models
SimpleSpeech: Towards Simple and Efficient Text-to-Speech with Scalar Latent Transformer Diffusion Models
Dongchao Yang
Dingdong Wang
Haohan Guo
Xueyuan Chen
Xixin Wu
Helen M. Meng
163
29
0
04 Jun 2024
An Independence-promoting Loss for Music Generation with Language Models
An Independence-promoting Loss for Music Generation with Language Models
Jean-Marie Lemercier
Simon Rouard
Jade Copet
Yossi Adi
Alexandre Défossez
162
1
0
04 Jun 2024
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Phonetic Enhanced Language Modeling for Text-to-Speech Synthesis
Kun Zhou
Shengkui Zhao
Yukun Ma
Chong Zhang
Hao Wang
Dianwen Ng
Chongjia Ni
Nguyen Trung Hieu
J. Yip
Bin Ma
74
5
0
04 Jun 2024
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Enhancing Zero-shot Text-to-Speech Synthesis with Human Feedback
Chen Chen
Yuchen Hu
Wen Wu
Helin Wang
Chng Eng Siong
Chao Zhang
93
12
0
02 Jun 2024
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in
  Zero and Few-shot Learning
Wav2Prompt: End-to-End Speech Prompt Generation and Tuning For LLM in Zero and Few-shot Learning
Keqi Deng
Guangzhi Sun
Phil Woodland
VLM
72
4
0
01 Jun 2024
Creative Text-to-Audio Generation via Synthesizer Programming
Creative Text-to-Audio Generation via Synthesizer Programming
Manuel Cherep
Nikhil Singh
Jessica Shand
81
4
0
01 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLMMedIm
114
2
0
31 May 2024
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Zipper: A Multi-Tower Decoder Architecture for Fusing Modalities
Vicky Zayats
Peter Chen
Melissa Ferrari
Dirk Padfield
AI4CE
79
1
0
29 May 2024
TransVIP: Speech to Speech Translation System with Voice and Isochrony
  Preservation
TransVIP: Speech to Speech Translation System with Voice and Isochrony Preservation
Chenyang Le
Yao Qian
Dongmei Wang
Long Zhou
Shujie Liu
...
Midia Yousefi
Yanmin Qian
Jinyu Li
Sheng Zhao
Michael Zeng
88
3
0
28 May 2024
"Pass the butter": A study on desktop-classic multitasking robotic arm
  based on advanced YOLOv7 and BERT
"Pass the butter": A study on desktop-classic multitasking robotic arm based on advanced YOLOv7 and BERT
Haohua Que
Wenbin Pan
Jie Xu
Hao Luo
Pei Wang
Li Zhang
72
1
0
27 May 2024
iVideoGPT: Interactive VideoGPTs are Scalable World Models
iVideoGPT: Interactive VideoGPTs are Scalable World Models
Jialong Wu
Shaofeng Yin
Ningya Feng
Xu He
Dong Li
Haifeng Zhang
Mingsheng Long
VGen
127
40
0
24 May 2024
FAdam: Adam is a natural gradient optimizer using diagonal empirical
  Fisher information
FAdam: Adam is a natural gradient optimizer using diagonal empirical Fisher information
Dongseong Hwang
ODL
97
9
0
21 May 2024
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Multi-speaker Text-to-speech Training with Speaker Anonymized Data
Wen-Chin Huang
Yi-Chiao Wu
Tomoki Toda
66
1
0
20 May 2024
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based
  Speech Language Model
Evaluating Text-to-Speech Synthesis from a Large Discrete Token-based Speech Language Model
Siyang Wang
Éva Székely
107
6
0
16 May 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment
  Generation
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
68
2
0
13 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
100
21
0
08 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
101
29
0
30 Apr 2024
Fake it to make it: Using synthetic data to remedy the data shortage in
  joint multimodal speech-and-gesture synthesis
Fake it to make it: Using synthetic data to remedy the data shortage in joint multimodal speech-and-gesture synthesis
Shivam Mehta
Anna Deichler
Jim O'Regan
Birger Moëll
Jonas Beskow
G. Henter
Simon Alexanderson
96
4
0
30 Apr 2024
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized
  Transformers
ESC: Efficient Speech Coding with Cross-Scale Residual Vector Quantized Transformers
Yuzhe Gu
Enmao Diao
106
4
0
30 Apr 2024
Bayesian Example Selection Improves In-Context Learning for Speech,
  Text, and Visual Modalities
Bayesian Example Selection Improves In-Context Learning for Speech, Text, and Visual Modalities
Siyin Wang
Chao-Han Huck Yang
Ji Wu
Chao Zhang
BDL
106
5
0
23 Apr 2024
DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection
DeepFake-O-Meter v2.0: An Open Platform for DeepFake Detection
Yan Ju
Chengzhe Sun
Shan Jia
Shuwei Hou
Zhaofeng Si
Soumyya Kanti Datta
Lipeng Ke
Riky Zhou
Anita Nikolich
Siwei Lyu
120
3
0
19 Apr 2024
Long-form music generation with latent diffusion
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGenDiffM
124
45
0
16 Apr 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
145
1
0
16 Apr 2024
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through
  Direct Preference Optimization
Tango 2: Aligning Diffusion-based Text-to-Audio Generations through Direct Preference Optimization
Navonil Majumder
Chia-Yu Hung
Deepanway Ghosal
Wei-Ning Hsu
Rada Mihalcea
Soujanya Poria
148
61
0
15 Apr 2024
TransformerFAM: Feedback attention is working memory
TransformerFAM: Feedback attention is working memory
Dongseong Hwang
Weiran Wang
Zhuoyuan Huo
K. Sim
P. M. Mengibar
121
12
0
14 Apr 2024
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Llama-VITS: Enhancing TTS Synthesis with Semantic Awareness
Xincan Feng
A. Yoshimoto
98
3
0
10 Apr 2024
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like
  Multi-talker Conversations
CoVoMix: Advancing Zero-Shot Speech Generation for Human-like Multi-talker Conversations
Leying Zhang
Yao Qian
Long Zhou
Shujie Liu
Dongmei Wang
...
Yanmin Qian
Jinyu Li
Lei He
Sheng Zhao
Michael Zeng
77
2
0
10 Apr 2024
The X-LANCE Technical Report for Interspeech 2024 Speech Processing
  Using Discrete Speech Unit Challenge
The X-LANCE Technical Report for Interspeech 2024 Speech Processing Using Discrete Speech Unit Challenge
Yiwei Guo
Chenrun Wang
Yifan Yang
Hankun Wang
Ziyang Ma
...
Hanzheng Li
Shuai Fan
Hui Zhang
Xie Chen
Kai Yu
88
1
0
09 Apr 2024
Gull: A Generative Multifunctional Audio Codec
Gull: A Generative Multifunctional Audio Codec
Yi Luo
Jianwei Yu
Hangting Chen
Rongzhi Gu
Chao Weng
AuLLM
89
3
0
07 Apr 2024
Training LLMs over Neurally Compressed Text
Training LLMs over Neurally Compressed Text
Brian Lester
Jaehoon Lee
A. Alemi
Jeffrey Pennington
Adam Roberts
Jascha Narain Sohl-Dickstein
Noah Constant
92
7
0
04 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
125
44
0
03 Apr 2024
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled
  Representation Learning based Adaptive Feature-aware Prompt Encoders
PromptCodec: High-Fidelity Neural Speech Codec using Disentangled Representation Learning based Adaptive Feature-aware Prompt Encoders
Yu Pan
Lei Ma
Jianjun Zhao
101
6
0
03 Apr 2024
The VoicePrivacy 2024 Challenge Evaluation Plan
The VoicePrivacy 2024 Challenge Evaluation Plan
N. Tomashenko
Xiaoxiao Miao
Pierre Champion
Sarina Meyer
Xin Wang
Emmanuel Vincent
Michele Panariello
Nicholas W. D. Evans
Junichi Yamagishi
Massimiliano Todisco
125
25
0
03 Apr 2024
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Transforming LLMs into Cross-modal and Cross-lingual Retrieval Systems
Frank Palma Gomez
Ramon Sanabria
Yun-hsuan Sung
Daniel Cer
Siddharth Dalmia
Gustavo Hernández Ábrego
VLM
108
4
0
02 Apr 2024
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
Junghyun Koo
Gordon Wichern
François Germain
Sameer Khurana
Jonathan Le Roux
103
5
0
02 Apr 2024
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing
  Using Discrete Speech Unit Challenge
UTDUSS: UTokyo-SaruLab System for Interspeech2024 Speech Processing Using Discrete Speech Unit Challenge
Wataru Nakata
Kazuki Yamauchi
Dong Yang
Hiroaki Hyodo
Yuki Saito
107
0
0
20 Mar 2024
An Empirical Study of Speech Language Models for Prompt-Conditioned
  Speech Synthesis
An Empirical Study of Speech Language Models for Prompt-Conditioned Speech Synthesis
Yifan Peng
Ilia Kulikov
Yilin Yang
Sravya Popuri
Hui Lu
Changhan Wang
Hongyu Gong
65
1
0
19 Mar 2024
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Prompt-Singer: Controllable Singing-Voice-Synthesis with Natural Language Prompt
Yongqi Wang
Ruofan Hu
Rongjie Huang
Zhiqing Hong
Ruiqi Li
Wenrui Liu
Fuming You
Tao Jin
Zhou Zhao
116
13
0
18 Mar 2024
Knowledge Conflicts for LLMs: A Survey
Knowledge Conflicts for LLMs: A Survey
Rongwu Xu
Zehan Qi
Zhijiang Guo
Cunxiang Wang
Hongru Wang
Yue Zhang
Wei Xu
313
122
0
13 Mar 2024
Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How
Transparent AI Disclosure Obligations: Who, What, When, Where, Why, How
Abdallah El Ali
Karthikeya Puttur Venkatraj
Sophie Morosoli
Laurens Naudts
Natali Helberger
Pablo César
96
20
0
11 Mar 2024
Previous
123456...8910
Next