ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation
v1v2 (latest)

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 452 papers shown
Title
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
Sven Hollowell
Tashi Namgyal
Paul Marshall
71
0
0
06 Dec 2023
MoMask: Generative Masked Modeling of 3D Human Motions
MoMask: Generative Masked Modeling of 3D Human Motions
Chuan Guo
Yuxuan Mu
Muhammad Gohar Javed
Sen Wang
Li Cheng
VGen
105
145
0
29 Nov 2023
Visual cognition in multimodal large language models
Visual cognition in multimodal large language models
Luca M. Schulze Buschoff
Elif Akata
Matthias Bethge
Eric Schulz
LRM
127
20
0
27 Nov 2023
Spoken Word2Vec: Learning Skipgram Embeddings from Speech
Spoken Word2Vec: Learning Skipgram Embeddings from Speech
Mohammad Amaan Sayeed
Hanan Aldarmaki
53
0
0
15 Nov 2023
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Haici Yang
Inseon Jang
Minje Kim
DiffM
120
7
0
14 Nov 2023
Music ControlNet: Multiple Time-varying Controls for Music Generation
Music ControlNet: Multiple Time-varying Controls for Music Generation
Shih-Lun Wu
Chris Donahue
Shinji Watanabe
Nicholas J. Bryan
DiffMMGen
111
61
0
13 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLMMLLM
83
3
0
08 Nov 2023
InstrumentGen: Generating Sample-Based Musical Instruments From Text
InstrumentGen: Generating Sample-Based Musical Instruments From Text
S. Nercessian
Johannes Imort
68
2
0
07 Nov 2023
Yet Another Generative Model For Room Impulse Response Estimation
Yet Another Generative Model For Room Impulse Response Estimation
Sungho Lee
Hyeong-Seok Choi
Kyogu Lee
72
10
0
05 Nov 2023
Large Language Models Illuminate a Progressive Pathway to Artificial
  Healthcare Assistant: A Review
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
Mingze Yuan
Peng Bao
Jiajia Yuan
Yunhao Shen
Zi Chen
...
Jie Zhao
Yang Chen
Li Zhang
Lin Shen
Bin Dong
ELMLM&MA
106
16
0
03 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
90
33
0
02 Nov 2023
Sound of Story: Multi-modal Storytelling with Audio
Sound of Story: Multi-modal Storytelling with Audio
Jaeyeon Bae
Seokhoon Jeong
Seokun Kang
Namgi Han
Jae-Yon Lee
Hyounghun Kim
Taehwan Kim
59
4
0
30 Oct 2023
JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music
  Generation
JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Yao Yao
Peike Li
Boyu Chen
Alex Wang
DiffM
84
11
0
29 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
106
36
0
25 Oct 2023
Exploring In-Context Learning of Textless Speech Language Model for
  Speech Classification Tasks
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
Ming-Hao Hsu
Kai-Wei Chang
Shang-Wen Li
Hung-yi Lee
96
7
0
19 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
112
9
0
16 Oct 2023
Low-latency Speech Enhancement via Speech Token Generation
Low-latency Speech Enhancement via Speech Token Generation
Huaying Xue
Xiulian Peng
Yan Lu
66
3
0
13 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
145
17
0
11 Oct 2023
Learning Interactive Real-World Simulators
Learning Interactive Real-World Simulators
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&RoPINN
90
215
0
09 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
119
16
0
08 Oct 2023
Prompt-to-OS (P2OS): Revolutionizing Operating Systems and
  Human-Computer Interaction with Integrated AI Generative Models
Prompt-to-OS (P2OS): Revolutionizing Operating Systems and Human-Computer Interaction with Integrated AI Generative Models
Gabriele Tolomei
Cesare Campagnano
Fabrizio Silvestri
Giovanni Trappolini
81
4
0
07 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAGVLMAuLLMLM&MA
131
87
0
07 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
Kai-Wei Chang
Ming-Hsin Chen
Yun-Ping Lin
Jing Neng Hsu
Paul Kuo-Ming Huang
Chien-yu Huang
Shang-Wen Li
Hung-yi Lee
100
6
0
04 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBMAuLLM
153
128
0
01 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
119
58
0
30 Sep 2023
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Ari Seff
Brian Cera
Dian Chen
Mason Ng
Aurick Zhou
Nigamaa Nayakanti
Khaled S. Refaat
Rami Al-Rfou
Benjamin Sapp
78
103
0
28 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with
  Discrete Speech Units: A Comparative Study
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
127
42
0
27 Sep 2023
High-Fidelity Speech Synthesis with Minimal Supervision: All Using
  Diffusion Models
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
Chunyu Qiang
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang
DiffM
107
2
0
27 Sep 2023
User Experience Design Professionals' Perceptions of Generative
  Artificial Intelligence
User Experience Design Professionals' Perceptions of Generative Artificial Intelligence
Jie Li
Hancheng Cao
Laura Lin
Youyang Hou
Ruihao Zhu
Abdallah El Ali
83
62
0
26 Sep 2023
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Towards General-Purpose Text-Instruction-Guided Voice Conversion
Chun-Yi Kuan
Chen-An Li
Tsung-Yuan Hsu
Tzu-Quan Lin
Ho-Lam Chung
Kai-Wei Chang
Shuo-yiin Chang
Hung-yi Lee
80
6
0
25 Sep 2023
Speaker anonymization using neural audio codec language models
Speaker anonymization using neural audio codec language models
Michele Panariello
Francesco Nespoli
Massimiliano Todisco
Nicholas W. D. Evans
62
20
0
25 Sep 2023
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech
  Data
AutoPrep: An Automatic Preprocessing Framework for In-the-Wild Speech Data
Jianwei Yu
Hangting Chen
Yanyao Bian
Xiang Li
Yimin Luo
Jinchuan Tian
Mengyang Liu
Jiayi Jiang
Shuai Wang
VLM
72
16
0
25 Sep 2023
VoiceLDM: Text-to-Speech with Environmental Context
VoiceLDM: Text-to-Speech with Environmental Context
Yeong-Won Lee
In-won Yeon
Juhan Nam
Joon Son Chung
VLMDiffM
75
15
0
24 Sep 2023
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with
  Multi-Scale Acoustic Prompts
Improving Language Model-Based Zero-Shot Text-to-Speech Synthesis with Multi-Scale Acoustic Prompts
Shunwei Lei
Yixuan Zhou
Liyang Chen
Dan Luo
Zhiyong Wu
...
Shiyin Kang
Tao Jiang
Yahui Zhou
Yuxing Han
Helen M. Meng
VLM
92
2
0
21 Sep 2023
Speak While You Think: Streaming Speech Synthesis During Text Generation
Speak While You Think: Streaming Speech Synthesis During Text Generation
Avihu Dekel
Slava Shechtman
Raul Fernandez
David Haws
Zvi Kons
R. Hoory
69
9
0
20 Sep 2023
Towards Joint Modeling of Dialogue Response and Speech Synthesis based
  on Large Language Model
Towards Joint Modeling of Dialogue Response and Speech Synthesis based on Large Language Model
Xinyu Zhou
Delong Chen
Yudong Chen
AuLLM
56
0
0
20 Sep 2023
Discrete Audio Representation as an Alternative to Mel-Spectrograms for
  Speaker and Speech Recognition
Discrete Audio Representation as an Alternative to Mel-Spectrograms for Speaker and Speech Recognition
Krishna C. Puvvada
Nithin Rao Koluguri
Kunal Dhawan
Jagadeesh Balam
Boris Ginsburg
78
17
0
19 Sep 2023
MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation
MelodyGLM: Multi-task Pre-training for Symbolic Melody Generation
Xinda Wu
Zhijie Huang
Kejun Zhang
Jiaxing Yu
Xu Tan
Tieyao Zhang
Zihao Wang
Lingyun Sun
82
5
0
19 Sep 2023
FoleyGen: Visually-Guided Audio Generation
FoleyGen: Visually-Guided Audio Generation
Xinhao Mei
Varun K. Nagaraja
Gaël Le Lan
Zhaoheng Ni
Ernie Chang
Yangyang Shi
Vikas Chandra
VGen
88
24
0
19 Sep 2023
Do learned speech symbols follow Zipf's law?
Do learned speech symbols follow Zipf's law?
Shinnosuke Takamichi
Hiroki Maeda
Joonyong Park
Daisuke Saito
Hiroshi Saruwatari
59
1
0
18 Sep 2023
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained
  Generative Methods for Speech Enhancement in Adverse Conditions
Unifying Robustness and Fidelity: A Comprehensive Study of Pretrained Generative Methods for Speech Enhancement in Adverse Conditions
Heming Wang
Meng Yu
Huatian Zhang
Chunlei Zhang
Zhongweiyang Xu
Muqiao Yang
Yixuan Zhang
Dong Yu
90
3
0
16 Sep 2023
Enhance audio generation controllability through representation
  similarity regularization
Enhance audio generation controllability through representation similarity regularization
Yangyang Shi
Gaël Le Lan
Varun K. Nagaraja
Zhaoheng Ni
Xinhao Mei
Ernie Chang
Forrest N. Iandola
Yang Liu
Vikas Chandra
68
1
0
15 Sep 2023
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Speech-to-Speech Translation with Discrete-Unit-Based Style Transfer
Yongqiang Wang
Jionghao Bai
Rongjie Huang
Ruiqi Li
Zhiqing Hong
Zhou Zhao
49
3
0
14 Sep 2023
Masked Generative Modeling with Enhanced Sampling Scheme
Masked Generative Modeling with Enhanced Sampling Scheme
Daesoo Lee
Erlend Aune
Sara Malacarne
DiffM
54
3
0
14 Sep 2023
Voxtlm: unified decoder-only models for consolidating speech
  recognition/synthesis and speech/text continuation tasks
Voxtlm: unified decoder-only models for consolidating speech recognition/synthesis and speech/text continuation tasks
Soumi Maiti
Yifan Peng
Shukjae Choi
Jee-weon Jung
Xuankai Chang
Shinji Watanabe
VLMAuLLM
127
69
0
14 Sep 2023
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Towards Universal Speech Discrete Tokens: A Case Study for ASR and TTS
Yifan Yang
Feiyu Shen
Chenpeng Du
Ziyang Ma
K. Yu
Daniel Povey
Xie Chen
93
27
0
14 Sep 2023
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
UnifiedGesture: A Unified Gesture Synthesis Model for Multiple Skeletons
Sicheng Yang
Zehao Wang
Zhiyong Wu
Minglei Li
Zhensong Zhang
...
Lei Hao
Songcen Xu
Xiaofei Wu
Changpeng Yang
Zonghong Dai
DiffM
108
14
0
13 Sep 2023
MAGMA: Music Aligned Generative Motion Autodecoder
MAGMA: Music Aligned Generative Motion Autodecoder
Sohan Anisetty
Amit Raj
James Hays
61
0
0
03 Sep 2023
RepCodec: A Speech Representation Codec for Speech Tokenization
RepCodec: A Speech Representation Codec for Speech Tokenization
Zhichao Huang
Chutong Meng
Tom Ko
92
28
0
31 Aug 2023
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language
  Models
SpeechTokenizer: Unified Speech Tokenizer for Speech Large Language Models
Xin Zhang
Dong Zhang
Shimin Li
Yaqian Zhou
Xipeng Qiu
119
66
0
31 Aug 2023
Previous
123...106789
Next