ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.03143
  4. Cited By
AudioLM: a Language Modeling Approach to Audio Generation

AudioLM: a Language Modeling Approach to Audio Generation

7 September 2022
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
Matthew Sharifi
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
    AuLLM
ArXivPDFHTML

Papers citing "AudioLM: a Language Modeling Approach to Audio Generation"

50 / 428 papers shown
Title
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In
  Video-to-Audio Synthesis
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis
Zhiqi Huang
Dan Luo
Jun Wang
Huan Liao
Zhiheng Li
Zhiyong Wu
VGen
53
4
0
13 Sep 2024
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music
  Videos
VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos
Yan-Bo Lin
Yu Tian
L. Yang
Gedas Bertasius
Heng Wang
VGen
34
7
0
11 Sep 2024
Salmon: A Suite for Acoustic Language Model Evaluation
Salmon: A Suite for Acoustic Language Model Evaluation
Gallil Maimon
Amit Roth
Yossi Adi
ELM
AuLLM
54
5
0
11 Sep 2024
An End-to-End Approach for Chord-Conditioned Song Generation
An End-to-End Approach for Chord-Conditioned Song Generation
Shuochen Gao
Shun Lei
Fan Zhuo
Hangyu Liu
Feng Liu
Boshi Tang
Qiaochu Huang
Shiyin Kang
Zhiyong Wu
28
2
0
10 Sep 2024
DENSE: Dynamic Embedding Causal Target Speech Extraction
DENSE: Dynamic Embedding Causal Target Speech Extraction
Yiwen Wang
Zeyu Yuan
Xihong Wu
46
0
0
10 Sep 2024
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and
  Voice Conversion
VC-ENHANCE: Speech Restoration with Integrated Noise Suppression and Voice Conversion
Kyungguen Byun
Jason Filos
Erik Visser
Sunkuk Moon
36
0
0
10 Sep 2024
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
LLaMA-Omni: Seamless Speech Interaction with Large Language Models
Qingkai Fang
Shoutao Guo
Yan Zhou
Zhengrui Ma
Shaolei Zhang
Yang Feng
AuLLM
33
30
0
10 Sep 2024
Estimating the Completeness of Discrete Speech Units
Estimating the Completeness of Discrete Speech Units
Sung-Lin Yeh
Hao Tang
36
1
0
09 Sep 2024
SongCreator: Lyrics-based Universal Song Generation
SongCreator: Lyrics-based Universal Song Generation
Shun Lei
Yixuan Zhou
Boshi Tang
Max W. Y. Lam
Feng Liu
Hangyu Liu
Jingcheng Wu
Shiyin Kang
Zhiyong Wu
Helen Meng
52
5
0
09 Sep 2024
PAIGE: Examining Learning Outcomes and Experiences with Personalized
  AI-Generated Educational Podcasts
PAIGE: Examining Learning Outcomes and Experiences with Personalized AI-Generated Educational Podcasts
Tiffany D. Do
Usama Bin Shafqat
Elsie Ling
Nikhil Sarda
48
2
0
06 Sep 2024
Investigating Neural Audio Codecs for Speech Language Model-Based Speech
  Generation
Investigating Neural Audio Codecs for Speech Language Model-Based Speech Generation
Jiaqi Li
Dongmei Wang
Xiaofei Wang
Yao Qian
Long Zhou
...
Junkun Chen
Sheng Zhao
Jinyu Li
Zhizheng Wu
Michael Zeng
AuLLM
30
3
0
06 Sep 2024
LAST: Language Model Aware Speech Tokenization
LAST: Language Model Aware Speech Tokenization
A. Turetzky
Yossi Adi
37
2
0
05 Sep 2024
STAB: Speech Tokenizer Assessment Benchmark
STAB: Speech Tokenizer Assessment Benchmark
Shikhar Vashishth
Harman Singh
Shikhar Bharadwaj
Sriram Ganapathy
Chulayuth Asawaroengchai
Kartik Audhkhasi
Andrew Rosenberg
Ankur Bapna
Bhuvana Ramabhadran
57
0
0
04 Sep 2024
Wavelet GPT: Wavelet Inspired Large Language Models
Wavelet GPT: Wavelet Inspired Large Language Models
Prateek Verma
AI4TS
23
0
0
04 Sep 2024
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Sample-Efficient Diffusion for Text-To-Speech Synthesis
Justin Lovelace
Soham Ray
Kwangyoun Kim
Kilian Q. Weinberger
Felix Wu
36
2
0
01 Sep 2024
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec
  Transformer
MaskGCT: Zero-Shot Text-to-Speech with Masked Generative Codec Transformer
Yuancheng Wang
Haoyue Zhan
Liwei Liu
Ruihong Zeng
Haotian Guo
Jiachen Zheng
Qiang Zhang
Shunsi Zhang
Shunsi Zhang
Zhizheng Wu
36
42
0
01 Sep 2024
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio
  Language Model
Codec Does Matter: Exploring the Semantic Shortcoming of Codec for Audio Language Model
Zhen Ye
Peiwen Sun
Jiahe Lei
Hongzhan Lin
Xu Tan
...
Jianyi Chen
Jiahao Pan
Qifeng Liu
Yike Guo
Wei Xue
AuLLM
34
13
0
30 Aug 2024
FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition
FissionVAE: Federated Non-IID Image Generation with Latent Space and Decoder Decomposition
Chen Hu
Hanchi Ren
Jingjing Deng
Xianghua Xie
Xiaoke Ma
FedML
71
0
0
30 Aug 2024
Blending Low and High-Level Semantics of Time Series for Better Masked
  Time Series Generation
Blending Low and High-Level Semantics of Time Series for Better Masked Time Series Generation
Johan Vik Mathisen
Erlend Lokna
Daesoo Lee
Erlend Aune
BDL
AI4TS
29
0
0
29 Aug 2024
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Enabling Beam Search for Language Model-Based Text-to-Speech Synthesis
Zehai Tu
Guangyan Zhang
Yiting Lu
Adaeze Adigwe
Simon King
Yiwen Guo
43
0
0
29 Aug 2024
SSDM: Scalable Speech Dysfluency Modeling
SSDM: Scalable Speech Dysfluency Modeling
Jiachen Lian
Xuanru Zhou
Z. Ezzes
Jet M J Vonk
Brittany Morin
D. Baquirin
Zachary Mille
M. G. Tempini
Gopala Anumanchipalli
AuLLM
32
1
0
29 Aug 2024
SpeechPrompt: Prompting Speech Language Models for Speech Processing
  Tasks
SpeechPrompt: Prompting Speech Language Models for Speech Processing Tasks
Kai-Wei Chang
Haibin Wu
Yu-Kai Wang
Yuan-Kuei Wu
Hua Shen
Wei-Cheng Tseng
Iu-thing Kang
Shang-Wen Li
Hung-yi Lee
53
3
0
23 Aug 2024
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based
  Deepfake Audio?
Does Current Deepfake Audio Detection Model Effectively Detect ALM-based Deepfake Audio?
Yuankun Xie
Chenxu Xiong
Xiaopeng Wang
Zhiyong Wang
Yi Lu
...
Yukun Liu
Zhengqi Wen
Jianhua Tao
Guanjun Li
Long Ye
AuLLM
34
1
0
20 Aug 2024
Adversarial training of Keyword Spotting to Minimize TTS Data
  Overfitting
Adversarial training of Keyword Spotting to Minimize TTS Data Overfitting
Hyun Jin Park
Dhruuv Agarwal
Neng Chen
Rentao Sun
Kurt Partridge
...
Jacob Bartel
Kyle Kastner
Gary Wang
Andrew Rosenberg
Quan Wang
37
0
0
20 Aug 2024
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Convert and Speak: Zero-shot Accent Conversion with Minimum Supervision
Zhijun Jia
Huaying Xue
Xiulian Peng
Yan Lu
18
2
0
19 Aug 2024
PRESENT: Zero-Shot Text-to-Prosody Control
PRESENT: Zero-Shot Text-to-Prosody Control
Perry Lam
Huayun Zhang
Nancy F. Chen
Berrak Sisman
Dorien Herremans
43
0
0
13 Aug 2024
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
FLEURS-R: A Restored Multilingual Speech Corpus for Generation Tasks
Min Ma
Yuma Koizumi
Shigeki Karita
Heiga Zen
Jason Riesa
Haruko Ishikawa
M. Bacchiani
VLM
35
4
0
12 Aug 2024
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for
  Speech Processing
VQ-CTAP: Cross-Modal Fine-Grained Sequence Representation Learning for Speech Processing
Chunyu Qiang
Wang Geng
Yi Zhao
Ruibo Fu
Tao Wang
...
Chen Zhang
Hao Che
Longbiao Wang
Jianwu Dang
Jianhua Tao
AI4TS
38
0
0
11 Aug 2024
Simulating Articulatory Trajectories with Phonological Feature
  Interpolation
Simulating Articulatory Trajectories with Phonological Feature Interpolation
Angelo Ortiz Tandazo
Thomas Schatz
Thomas Hueber
Emmanuel Dupoux
35
0
0
08 Aug 2024
Language Model Can Listen While Speaking
Language Model Can Listen While Speaking
Ziyang Ma
Yakun Song
Chenpeng Du
Jian Cong
Zhuo Chen
Yuping Wang
Yansen Wang
Xie Chen
AuLLM
37
23
0
05 Aug 2024
SynesLM: A Unified Approach for Audio-visual Speech Recognition and
  Translation via Language Model and Synthetic Data
SynesLM: A Unified Approach for Audio-visual Speech Recognition and Translation via Language Model and Synthetic Data
Yichen Lu
Álvaro Huertas-García
Xuankai Chang
Hengwei Bian
Soumi Maiti
Shinji Watanabe
46
2
0
01 Aug 2024
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like
  Spontaneous Representation
Bailing-TTS: Chinese Dialectal Speech Synthesis Towards Human-like Spontaneous Representation
Xinhan Di
Jiahao Lu
Yunming Liang
Junjie Zheng
Yihua Wang
Chaofan Ding
ALM
35
1
0
01 Aug 2024
Enhancing Anti-spoofing Countermeasures Robustness through Joint
  Optimization and Transfer Learning
Enhancing Anti-spoofing Countermeasures Robustness through Joint Optimization and Transfer Learning
Yikang Wang
Xingming Wang
Hiromitsu Nishizaki
Ming Li
AAML
37
0
0
29 Jul 2024
Utilizing TTS Synthesized Data for Efficient Development of Keyword
  Spotting Model
Utilizing TTS Synthesized Data for Efficient Development of Keyword Spotting Model
H. Park
Dhruuv Agarwal
Neng Chen
Rentao Sun
Kurt Partridge
...
Jacob Bartel
Kyle Kastner
Gary Wang
Andrew Rosenberg
Quan Wang
21
2
0
26 Jul 2024
Generating Sample-Based Musical Instruments Using Neural Audio Codec
  Language Models
Generating Sample-Based Musical Instruments Using Neural Audio Codec Language Models
S. Nercessian
Johannes Imort
Ninon Devis
Frederik Blang
38
1
0
22 Jul 2024
Chronologically Accurate Retrieval for Temporal Grounding of
  Motion-Language Models
Chronologically Accurate Retrieval for Temporal Grounding of Motion-Language Models
Kent Fujiwara
Mikihiro Tanaka
Qing Yu
54
2
0
22 Jul 2024
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Exploring the Effectiveness of Object-Centric Representations in Visual Question Answering: Comparative Insights with Foundation Models
Amir Mohammad Karimi Mamaghan
Samuele Papa
Karl Henrik Johansson
Stefan Bauer
Andrea Dittadi
OCL
48
5
0
22 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
43
4
0
22 Jul 2024
On Pre-training of Multimodal Language Models Customized for Chart
  Understanding
On Pre-training of Multimodal Language Models Customized for Chart Understanding
Wan-Cyuan Fan
Yen-Chun Chen
Mengchen Liu
Lu Yuan
Leonid Sigal
48
5
0
19 Jul 2024
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous
  Behaviors Based on Language Models
Spontaneous Style Text-to-Speech Synthesis with Controllable Spontaneous Behaviors Based on Language Models
Weiqin Li
Pei-Yin Yang
Yicheng Zhong
Yixuan Zhou
Zhisheng Wang
Zhiyong Wu
Xixin Wu
Helen M. Meng
41
3
0
18 Jul 2024
A Language Modeling Approach to Diacritic-Free Hebrew TTS
A Language Modeling Approach to Diacritic-Free Hebrew TTS
Amit Roth
A. Turetzky
Yossi Adi
37
2
0
16 Jul 2024
Masked Generative Video-to-Audio Transformers with Enhanced
  Synchronicity
Masked Generative Video-to-Audio Transformers with Enhanced Synchronicity
Santiago Pascual
Chunghsin Yeh
Ioannis Tsiamas
Joan Serrà
DiffM
VGen
47
15
0
15 Jul 2024
Autoregressive Speech Synthesis without Vector Quantization
Autoregressive Speech Synthesis without Vector Quantization
Lingwei Meng
Long Zhou
Shujie Liu
Sanyuan Chen
Bing Han
...
Jinyu Li
Sheng Zhao
Xixin Wu
Helen Meng
Furu Wei
54
31
0
11 Jul 2024
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for
  Large-Scale Speech Generation
Emilia: An Extensive, Multilingual, and Diverse Speech Dataset for Large-Scale Speech Generation
Haorui He
Zengqiang Shang
Chaoren Wang
Xuyuan Li
Yicheng Gu
...
Peiyang Shi
Yuancheng Wang
Kai Chen
Pengyuan Zhang
Zhizheng Wu
38
37
0
07 Jul 2024
PAGURI: a user experience study of creative interaction with
  text-to-music models
PAGURI: a user experience study of creative interaction with text-to-music models
Francesca Ronchini
Luca Comanducci
Gabriele Perego
Fabio Antonacci
35
3
0
05 Jul 2024
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
On the Effectiveness of Acoustic BPE in Decoder-Only TTS
Bohan Li
Feiyu Shen
Yiwei Guo
Shuai Wang
Xie Chen
Kai Yu
39
2
0
04 Jul 2024
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference
  Optimization
Robust Zero-Shot Text-to-Speech Synthesis with Reverse Inference Optimization
Yuchen Hu
Chen Chen
Siyin Wang
Eng Siong Chng
C. Zhang
43
3
0
02 Jul 2024
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Accompanied Singing Voice Synthesis with Fully Text-controlled Melody
Ruiqi Li
Zhiqing Hong
Yongqi Wang
Lichao Zhang
Rongjie Huang
Siqi Zheng
Zhou Zhao
39
6
0
02 Jul 2024
Towards the Next Frontier in Speech Representation Learning Using
  Disentanglement
Towards the Next Frontier in Speech Representation Learning Using Disentanglement
Varun Krishna
Sriram Ganapathy
SSL
17
1
0
02 Jul 2024
Investigating the Effects of Large-Scale Pseudo-Stereo Data and
  Different Speech Foundation Model on Dialogue Generative Spoken Language
  Model
Investigating the Effects of Large-Scale Pseudo-Stereo Data and Different Speech Foundation Model on Dialogue Generative Spoken Language Model
Yu-Kuan Fu
Cheng-Kuang Lee
Hsiu-Hsuan Wang
Hung-yi Lee
30
0
0
02 Jul 2024
Previous
123456789
Next