ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXivPDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 428 papers shown
Title
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence
  Attribution for a Generative Music Model
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model
Julia Barnett
Hugo Flores Garcia
Bryan Pardo
40
7
0
25 Jan 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
38
0
0
25 Jan 2024
Contractive Diffusion Probabilistic Models
Contractive Diffusion Probabilistic Models
Wenpin Tang
Hanyang Zhao
DiffM
49
12
0
23 Jan 2024
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
DiffM
34
33
0
22 Jan 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
29
6
0
19 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
21
36
0
14 Jan 2024
Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives
Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives
Meredith Ringel Morris
Jed R. Brubaker
42
10
0
14 Jan 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
54
36
0
09 Jan 2024
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Soumya Dutta
Sriram Ganapathy
26
1
0
09 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
32
7
0
05 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
30
5
0
02 Jan 2024
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Hong-ping Hao
Long Zhou
Shujie Liu
Jinyu Li
Shujie Hu
Rui Wang
Furu Wei
34
18
0
30 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLM
MLLM
40
144
0
28 Dec 2023
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Ankur Gandhe
Chao-Han Huck Yang
Yile Gu
Shalini Ghosh
A. Stolcke
Hung-yi Lee
I. Bulyko
27
12
0
23 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
31
21
0
22 Dec 2023
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in
  Speech-to-Speech Models
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Maureen de Seyssel
Antony DÁvirro
Adina Williams
Emmanuel Dupoux
32
3
0
21 Dec 2023
T2M-HiFiGPT: Generating High Quality Human Motion from Textual
  Descriptions with Residual Discrete Representations
T2M-HiFiGPT: Generating High Quality Human Motion from Textual Descriptions with Residual Discrete Representations
Congyi Wang
22
4
0
17 Dec 2023
Efficient and Scalable Graph Generation through Iterative Local
  Expansion
Efficient and Scalable Graph Generation through Iterative Local Expansion
Andreas Bergmeister
Karolis Martinkus
Nathanael Perraudin
Roger Wattenhofer
25
12
0
14 Dec 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross
  Attention
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
40
13
0
14 Dec 2023
CAD: Photorealistic 3D Generation via Adversarial Distillation
CAD: Photorealistic 3D Generation via Adversarial Distillation
Bo Liu
Despoina Paschalidou
Ian Huang
Hongyu Liu
Bokui Shen
Xiaoyu Xiang
Jing Liao
Leonidas J. Guibas
DiffM
78
11
0
11 Dec 2023
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap
  with Extremely Limited Data
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data
M. Olson
Shusen Liu
Jayaraman J. Thiagarajan
B. Kustowski
Weng-Keen Wong
Rushil Anirudh
AI4CE
36
1
0
06 Dec 2023
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
JAMMIN-GPT: Text-based Improvisation using LLMs in Ableton Live
Sven Hollowell
Tashi Namgyal
Paul Marshall
27
0
0
06 Dec 2023
MoMask: Generative Masked Modeling of 3D Human Motions
MoMask: Generative Masked Modeling of 3D Human Motions
Chuan Guo
Yuxuan Mu
Muhammad Gohar Javed
Sen Wang
Li Cheng
VGen
37
121
0
29 Nov 2023
Visual cognition in multimodal large language models
Visual cognition in multimodal large language models
Luca M. Schulze Buschoff
Elif Akata
Matthias Bethge
Eric Schulz
LRM
51
14
0
27 Nov 2023
Spoken Word2Vec: Learning Skipgram Embeddings from Speech
Spoken Word2Vec: Learning Skipgram Embeddings from Speech
Mohammad Amaan Sayeed
Hanan Aldarmaki
22
0
0
15 Nov 2023
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Generative De-Quantization for Neural Speech Codec via Latent Diffusion
Haici Yang
Inseon Jang
Minje Kim
DiffM
45
6
0
14 Nov 2023
Music ControlNet: Multiple Time-varying Controls for Music Generation
Music ControlNet: Multiple Time-varying Controls for Music Generation
Shih-Lun Wu
Chris Donahue
Shinji Watanabe
Nicholas J. Bryan
DiffM
MGen
34
50
0
13 Nov 2023
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
TEAL: Tokenize and Embed ALL for Multi-modal Large Language Models
Zhen Yang
Yingxue Zhang
Fandong Meng
Jie Zhou
VLM
MLLM
47
3
0
08 Nov 2023
InstrumentGen: Generating Sample-Based Musical Instruments From Text
InstrumentGen: Generating Sample-Based Musical Instruments From Text
S. Nercessian
Johannes Imort
29
2
0
07 Nov 2023
Yet Another Generative Model For Room Impulse Response Estimation
Yet Another Generative Model For Room Impulse Response Estimation
Sungho Lee
Hyeong-Seok Choi
Kyogu Lee
31
10
0
05 Nov 2023
Large Language Models Illuminate a Progressive Pathway to Artificial
  Healthcare Assistant: A Review
Large Language Models Illuminate a Progressive Pathway to Artificial Healthcare Assistant: A Review
Mingze Yuan
Peng Bao
Jiajia Yuan
Yunhao Shen
Zi Chen
...
Jie Zhao
Yang Chen
Li Zhang
Lin Shen
Bin Dong
ELM
LM&MA
49
13
0
03 Nov 2023
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
E3 TTS: Easy End-to-End Diffusion-based Text to Speech
Yuan Gao
Nobuyuki Morioka
Yu Zhang
Nanxin Chen
DiffM
31
27
0
02 Nov 2023
Sound of Story: Multi-modal Storytelling with Audio
Sound of Story: Multi-modal Storytelling with Audio
Jaeyeon Bae
Seokhoon Jeong
Seokun Kang
Namgi Han
Jae-Yon Lee
Hyounghun Kim
Taehwan Kim
26
2
0
30 Oct 2023
JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music
  Generation
JEN-1 Composer: A Unified Framework for High-Fidelity Multi-Track Music Generation
Yao Yao
Peike Li
Boyu Chen
Alex Wang
DiffM
32
9
0
29 Oct 2023
Generative Pre-training for Speech with Flow Matching
Generative Pre-training for Speech with Flow Matching
Alexander H. Liu
Matt Le
Apoorv Vyas
Bowen Shi
Andros Tjandra
Wei-Ning Hsu
27
31
0
25 Oct 2023
Exploring In-Context Learning of Textless Speech Language Model for
  Speech Classification Tasks
Exploring In-Context Learning of Textless Speech Language Model for Speech Classification Tasks
Ming-Hao Hsu
Kai-Wei Chang
Shang-Wen Li
Hung-yi Lee
34
8
0
19 Oct 2023
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
SD-HuBERT: Sentence-Level Self-Distillation Induces Syllabic Organization in HuBERT
Cheol Jun Cho
Abdelrahman Mohamed
Shang-Wen Li
Alan W. Black
Gopala K. Anumanchipalli
39
8
0
16 Oct 2023
Low-latency Speech Enhancement via Speech Token Generation
Low-latency Speech Enhancement via Speech Token Generation
Huaying Xue
Xiulian Peng
Yan Lu
24
0
0
13 Oct 2023
Vec-Tok Speech: speech vectorization and tokenization for neural speech
  generation
Vec-Tok Speech: speech vectorization and tokenization for neural speech generation
Xinfa Zhu
Yuanjun Lv
Yinjiao Lei
Tao Li
Wendi He
Hongbin Zhou
Heng Lu
Lei Xie
40
16
0
11 Oct 2023
Learning Interactive Real-World Simulators
Learning Interactive Real-World Simulators
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&Ro
PINN
30
180
0
09 Oct 2023
Generative Spoken Language Model based on continuous word-sized audio
  tokens
Generative Spoken Language Model based on continuous word-sized audio tokens
Robin Algayres
Yossi Adi
Tu Nguyen
Jade Copet
Gabriel Synnaeve
Benoît Sagot
Emmanuel Dupoux
AuLLM
43
12
0
08 Oct 2023
Prompt-to-OS (P2OS): Revolutionizing Operating Systems and
  Human-Computer Interaction with Integrated AI Generative Models
Prompt-to-OS (P2OS): Revolutionizing Operating Systems and Human-Computer Interaction with Integrated AI Generative Models
Gabriele Tolomei
Cesare Campagnano
Fabrizio Silvestri
Giovanni Trappolini
24
4
0
07 Oct 2023
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
LauraGPT: Listen, Attend, Understand, and Regenerate Audio with GPT
Zhihao Du
Jiaming Wang
Qian Chen
Yunfei Chu
Zhifu Gao
...
Wen Wang
Siqi Zheng
Chang Zhou
Zhijie Yan
Shiliang Zhang
LLMAG
VLM
AuLLM
LM&MA
42
80
0
07 Oct 2023
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech
  Model
Prompting and Adapter Tuning for Self-supervised Encoder-Decoder Speech Model
Kai-Wei Chang
Ming-Hsin Chen
Yun-Ping Lin
Jing Neng Hsu
Paul Kuo-Ming Huang
Chien-yu Huang
Shang-Wen Li
Hung-yi Lee
23
6
0
04 Oct 2023
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
UniAudio: An Audio Foundation Model Toward Universal Audio Generation
Dongchao Yang
Jinchuan Tian
Xuejiao Tan
Rongjie Huang
Songxiang Liu
...
Jiang Bian
Xixin Wu
Zhou Zhao
Shinji Watanabe
Helen M. Meng
CVBM
AuLLM
28
115
0
01 Oct 2023
SLM: Bridge the thin gap between speech and text foundation models
SLM: Bridge the thin gap between speech and text foundation models
Mingqiu Wang
Wei Han
Izhak Shafran
Zelin Wu
Chung-Cheng Chiu
...
Zhong Meng
Golan Pundak
Nikhil Siddhartha
J. Schalkwyk
Yonghui Wu
AuLLM
39
56
0
30 Sep 2023
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
MotionLM: Multi-Agent Motion Forecasting as Language Modeling
Ari Seff
Brian Cera
Dian Chen
Mason Ng
Aurick Zhou
Nigamaa Nayakanti
Khaled S. Refaat
Rami Al-Rfou
Benjamin Sapp
35
92
0
28 Sep 2023
Exploring Speech Recognition, Translation, and Understanding with
  Discrete Speech Units: A Comparative Study
Exploring Speech Recognition, Translation, and Understanding with Discrete Speech Units: A Comparative Study
Xuankai Chang
Brian Yan
Kwanghee Choi
Jee-weon Jung
Yichen Lu
...
Pengcheng Guo
Yao-Fei Cheng
Pavel Denisov
Kohei Saijo
Hsiu-Hsuan Wang
31
36
0
27 Sep 2023
High-Fidelity Speech Synthesis with Minimal Supervision: All Using
  Diffusion Models
High-Fidelity Speech Synthesis with Minimal Supervision: All Using Diffusion Models
Chunyu Qiang
Hao Li
Yixin Tian
Yi Zhao
Ying Zhang
Longbiao Wang
Jianwu Dang
DiffM
41
2
0
27 Sep 2023
User Experience Design Professionals' Perceptions of Generative
  Artificial Intelligence
User Experience Design Professionals' Perceptions of Generative Artificial Intelligence
Jie Li
Hancheng Cao
Laura Lin
Youyang Hou
Ruihao Zhu
Abdallah El Ali
39
50
0
26 Sep 2023
Previous
123456789
Next