ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation
v1v2 (latest)

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXiv (abs)PDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 452 papers shown
Title
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and
  Diffusion Models
NaturalSpeech 3: Zero-Shot Speech Synthesis with Factorized Codec and Diffusion Models
Zeqian Ju
Yuancheng Wang
Kai Shen
Xu Tan
Detai Xin
...
Shikun Zhang
Jiang Bian
Lei He
Jinyu Li
Sheng Zhao
DiffM
168
180
0
05 Mar 2024
Beyond Language Models: Byte Models are Digital World Simulators
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
67
13
0
29 Feb 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
104
0
0
28 Feb 2024
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir Memon
154
0
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
301
22
0
28 Feb 2024
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Music Style Transfer with Time-Varying Inversion of Diffusion Models
Sifei Li
Yuxin Zhang
Fan Tang
Chongyang Ma
Weiming Dong
Changsheng Xu
DiffM
72
11
0
21 Feb 2024
PQA: Zero-shot Protein Question Answering for Free-form Scientific
  Enquiry with Large Language Models
PQA: Zero-shot Protein Question Answering for Free-form Scientific Enquiry with Large Language Models
Eli M. Carrami
Sahand Sharifzadeh
63
2
0
21 Feb 2024
Towards audio language modeling -- an overview
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
112
35
0
20 Feb 2024
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Codec-SUPERB: An In-Depth Analysis of Sound Codec Models
Haibin Wu
Ho-Lam Chung
Yi-Cheng Lin
Yuan-Kuei Wu
Xuanjun Chen
Yu-Chi Pai
Hsiu-Hsuan Wang
Kai-Wei Chang
Alexander H. Liu
Hung-yi Lee
123
29
0
20 Feb 2024
Advancing Large Language Models to Capture Varied Speaking Styles and
  Respond Properly in Spoken Conversations
Advancing Large Language Models to Capture Varied Speaking Styles and Respond Properly in Spoken Conversations
Guan-Ting Lin
Cheng-Han Chiang
Hung-yi Lee
113
29
0
20 Feb 2024
VideoPrism: A Foundational Visual Encoder for Video Understanding
VideoPrism: A Foundational Visual Encoder for Video Understanding
Long Zhao
N. B. Gundavarapu
Liangzhe Yuan
Hao Zhou
Shen Yan
...
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Ting Liu
Boqing Gong
VGen
131
36
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
97
136
0
19 Feb 2024
LVCHAT: Facilitating Long Video Comprehension
LVCHAT: Facilitating Long Video Comprehension
Yu Wang
Zeyuan Zhang
Julian McAuley
Zexue He
VLM
90
4
0
19 Feb 2024
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum
  Encoding and Decoding
APCodec: A Neural Audio Codec with Parallel Amplitude and Phase Spectrum Encoding and Decoding
Yang Ai
Xiao-Hang Jiang
Ye-Xin Lu
Hui-Peng Du
Zhenhua Ling
73
25
0
16 Feb 2024
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot
  Text-to-Speech
MobileSpeech: A Fast and High-Fidelity Framework for Mobile Zero-Shot Text-to-Speech
Shengpeng Ji
Ziyue Jiang
Hanting Wang
Jia-li Zuo
Zhou Zhao
85
16
0
14 Feb 2024
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model
  on 100K hours of data
BASE TTS: Lessons from building a billion-parameter Text-to-Speech model on 100K hours of data
Mateusz Lajszczak
Guillermo Cámbara
Yang Li
Fatih Beyhan
Arent van Korlaar
...
Bartosz Putrycz
Soledad López Gambino
Kayeon Yoo
Elena Sokolova
Thomas Drugman
LM&MA
113
88
0
12 Feb 2024
Unsupervised Sign Language Translation and Generation
Unsupervised Sign Language Translation and Generation
Zhengsheng Guo
Zhiwei He
Wenxiang Jiao
Xing Wang
Rui Wang
Kehai Chen
Zhaopeng Tu
Yong-mei Xu
Min Zhang
134
0
0
12 Feb 2024
GenTranslate: Large Language Models are Generative Multilingual Speech
  and Machine Translators
GenTranslate: Large Language Models are Generative Multilingual Speech and Machine Translators
Yuchen Hu
Chen Chen
Chao-Han Huck Yang
Ruizhe Li
Dong Zhang
Zhehuai Chen
Eng Siong Chng
91
21
0
10 Feb 2024
SpiRit-LM: Interleaved Spoken and Written Language Model
SpiRit-LM: Interleaved Spoken and Written Language Model
Tu Nguyen
Benjamin Muller
Bokai Yu
Marta R. Costa-jussá
Maha Elbayad
...
Itai Gat
Gabriel Synnaeve
Juan Pino
Benoît Sagot
Emmanuel Dupoux
AuLLMVLM
103
53
0
08 Feb 2024
Multi-Patch Prediction: Adapting LLMs for Time Series Representation
  Learning
Multi-Patch Prediction: Adapting LLMs for Time Series Representation Learning
Hao Wang
Xu Ju
Jiangtong Li
Zhijian Xu
Dawei Cheng
Qiang Xu
AI4TSKELM
92
19
0
07 Feb 2024
Fast Timing-Conditioned Latent Audio Diffusion
Fast Timing-Conditioned Latent Audio Diffusion
Zach Evans
CJ Carr
Josiah Taylor
Scott H. Hawley
Jordi Pons
DiffM
144
117
0
07 Feb 2024
MusicRL: Aligning Music Generation to Human Preferences
MusicRL: Aligning Music Generation to Human Preferences
Geoffrey Cideron
Sertan Girgin
Mauro Verzetti
Damien Vincent
Matej Kastelic
...
Olivier Pietquin
Matthieu Geist
Léonard Hussenot
Neil Zeghidour
A. Agostinelli
91
22
0
06 Feb 2024
Focal Modulation Networks for Interpretable Sound Classification
Focal Modulation Networks for Interpretable Sound Classification
Luca Della Libera
Cem Subakan
Mirco Ravanelli
82
2
0
05 Feb 2024
Retrieval Augmented End-to-End Spoken Dialog Models
Retrieval Augmented End-to-End Spoken Dialog Models
Mingqiu Wang
Izhak Shafran
H. Soltau
Wei Han
Yuan Cao
Dian Yu
Laurent El Shafey
RALMAuLLM
83
12
0
02 Feb 2024
Can MLLMs Perform Text-to-Image In-Context Learning?
Can MLLMs Perform Text-to-Image In-Context Learning?
Yuchen Zeng
Wonjun Kang
Yicong Chen
Hyung Il Koo
Kangwook Lee
MLLM
98
11
0
02 Feb 2024
Large Language Models for Time Series: A Survey
Large Language Models for Time Series: A Survey
Xiyuan Zhang
Ranak Roy Chowdhury
Rajesh K. Gupta
Jingbo Shang
AI4TS
156
67
0
02 Feb 2024
BAT: Learning to Reason about Spatial Sounds with Large Language Models
BAT: Learning to Reason about Spatial Sounds with Large Language Models
Zhisheng Zheng
Puyuan Peng
Ziyang Ma
Xie Chen
Eunsol Choi
David Harwath
LRM
127
19
0
02 Feb 2024
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
SpeechComposer: Unifying Multiple Speech Tasks with Prompt Composition
Yihan Wu
Soumi Maiti
Yifan Peng
Wangyou Zhang
Chenda Li
Yuyue Wang
Xihua Wang
Shinji Watanabe
Ruihua Song
80
4
0
31 Jan 2024
Proactive Detection of Voice Cloning with Localized Watermarking
Proactive Detection of Voice Cloning with Localized Watermarking
Robin San Roman
Pierre Fernandez
Alexandre Défossez
Teddy Furon
Tuan Tran
Hady ElSahar
159
54
0
30 Jan 2024
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence
  Attribution for a Generative Music Model
Exploring Musical Roots: Applying Audio Embeddings to Empower Influence Attribution for a Generative Music Model
Julia Barnett
Hugo Flores Garcia
Bryan Pardo
94
7
0
25 Jan 2024
Intelli-Z: Toward Intelligible Zero-Shot TTS
Intelli-Z: Toward Intelligible Zero-Shot TTS
Sunghee Jung
Won Jang
Jaesam Yoon
Bongwan Kim
85
1
0
25 Jan 2024
Contractive Diffusion Probabilistic Models
Contractive Diffusion Probabilistic Models
Wenpin Tang
Hanyang Zhao
DiffM
109
14
0
23 Jan 2024
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
DITTO: Diffusion Inference-Time T-Optimization for Music Generation
Cheng-i Wang
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
DiffM
122
41
0
22 Jan 2024
StreamVoice: Streamable Context-Aware Language Modeling for Real-time
  Zero-Shot Voice Conversion
StreamVoice: Streamable Context-Aware Language Modeling for Real-time Zero-Shot Voice Conversion
Zhichao Wang
Yuan-Jui Chen
Xinsheng Wang
Lei Xie
Yuping Wang
139
7
0
19 Jan 2024
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided
  Sequence Reordering
ELLA-V: Stable Neural Codec Language Modeling with Alignment-guided Sequence Reordering
Ya-Zhen Song
Zhuo Chen
Xiaofei Wang
Ziyang Ma
Xie Chen
AuLLM
116
42
0
14 Jan 2024
Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives
Generative Ghosts: Anticipating Benefits and Risks of AI Afterlives
Meredith Ringel Morris
Jed R. Brubaker
61
13
0
14 Jan 2024
Masked Audio Generation using a Single Non-Autoregressive Transformer
Masked Audio Generation using a Single Non-Autoregressive Transformer
Alon Ziv
Itai Gat
Gaël Le Lan
Tal Remez
Felix Kreuk
Alexandre Défossez
Jade Copet
Gabriel Synnaeve
Yossi Adi
112
40
0
09 Jan 2024
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Zero Shot Audio to Audio Emotion Transfer With Speaker Disentanglement
Soumya Dutta
Sriram Ganapathy
72
3
0
09 Jan 2024
Pheme: Efficient and Conversational Speech Generation
Pheme: Efficient and Conversational Speech Generation
Paweł Budzianowski
Taras Sereda
Tomasz Cichy
Ivan Vulić
78
7
0
05 Jan 2024
Efficient Parallel Audio Generation using Group Masked Language Modeling
Efficient Parallel Audio Generation using Group Masked Language Modeling
Myeonghun Jeong
Minchan Kim
Joun Yeop Lee
Nam Soo Kim
58
7
0
02 Jan 2024
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Boosting Large Language Model for Speech Synthesis: An Empirical Study
Hong-ping Hao
Long Zhou
Shujie Liu
Jinyu Li
Shujie Hu
Rui Wang
Furu Wei
125
19
0
30 Dec 2023
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision,
  Language, Audio, and Action
Unified-IO 2: Scaling Autoregressive Multimodal Models with Vision, Language, Audio, and Action
Jiasen Lu
Christopher Clark
Sangho Lee
Zichen Zhang
Savya Khosla
Ryan Marten
Derek Hoiem
Aniruddha Kembhavi
VLMMLLM
102
175
0
28 Dec 2023
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Paralinguistics-Enhanced Large Language Modeling of Spoken Dialogue
Guan-Ting Lin
Prashanth Gurunath Shivakumar
Ankur Gandhe
Chao-Han Huck Yang
Yile Gu
Shalini Ghosh
A. Stolcke
Hung-yi Lee
I. Bulyko
104
14
0
23 Dec 2023
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis
  Conditioned on Self-supervised Discrete Speech Representations
ZMM-TTS: Zero-shot Multilingual and Multispeaker Speech Synthesis Conditioned on Self-supervised Discrete Speech Representations
Cheng Gong
Xin Wang
Erica Cooper
Dan Wells
Longbiao Wang
Jianwu Dang
Korin Richmond
Junichi Yamagishi
116
25
0
22 Dec 2023
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in
  Speech-to-Speech Models
EmphAssess : a Prosodic Benchmark on Assessing Emphasis Transfer in Speech-to-Speech Models
Maureen de Seyssel
Antony DÁvirro
Adina Williams
Emmanuel Dupoux
76
7
0
21 Dec 2023
T2M-HiFiGPT: Generating High Quality Human Motion from Textual
  Descriptions with Residual Discrete Representations
T2M-HiFiGPT: Generating High Quality Human Motion from Textual Descriptions with Residual Discrete Representations
Congyi Wang
77
5
0
17 Dec 2023
Efficient and Scalable Graph Generation through Iterative Local
  Expansion
Efficient and Scalable Graph Generation through Iterative Local Expansion
Andreas Bergmeister
Karolis Martinkus
Nathanael Perraudin
Roger Wattenhofer
96
16
0
14 Dec 2023
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross
  Attention
SEF-VC: Speaker Embedding Free Zero-Shot Voice Conversion with Cross Attention
Junjie Li
Yiwei Guo
Xie Chen
Kai Yu
113
18
0
14 Dec 2023
CAD: Photorealistic 3D Generation via Adversarial Distillation
CAD: Photorealistic 3D Generation via Adversarial Distillation
Bo Liu
Despoina Paschalidou
Ian Huang
Hongyu Liu
Bokui Shen
Xiaoyu Xiang
Jing Liao
Leonidas Guibas
DiffM
148
11
0
11 Dec 2023
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap
  with Extremely Limited Data
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data
M. Olson
Shusen Liu
Jayaraman J. Thiagarajan
B. Kustowski
Weng-Keen Wong
Rushil Anirudh
AI4CE
95
1
0
06 Dec 2023
Previous
123...1056789
Next