ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2209.15352
  4. Cited By
AudioGen: Textually Guided Audio Generation

AudioGen: Textually Guided Audio Generation

30 September 2022
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
    DiffM
ArXivPDFHTML

Papers citing "AudioGen: Textually Guided Audio Generation"

50 / 78 papers shown
Title
Continual Multimodal Contrastive Learning
Continual Multimodal Contrastive Learning
Xiaohao Liu
Xiaobo Xia
See-Kiong Ng
Tat-Seng Chua
CLL
183
2
0
19 Mar 2025
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
DiffGAP: A Lightweight Diffusion Module in Contrastive Space for Bridging Cross-Model Gap
Shentong Mo
Zehua Chen
Fan Bao
Jun-Jie Zhu
DiffM
77
1
0
15 Mar 2025
AudioX: Diffusion Transformer for Anything-to-Audio Generation
AudioX: Diffusion Transformer for Anything-to-Audio Generation
Zeyue Tian
Yizhu Jin
Zhaoyang Liu
Ruibin Yuan
Xu Tan
Qifeng Chen
Wei Xue
Yu Guo
82
5
0
13 Mar 2025
MatchMaker: Automated Asset Generation for Robotic Assembly
MatchMaker: Automated Asset Generation for Robotic Assembly
Yian Wang
Bingjie Tang
Chuang Gan
Dieter Fox
Kaichun Mo
Yashraj S. Narang
Iretiayo Akinola
99
1
0
07 Mar 2025
High-Fidelity Simultaneous Speech-To-Speech Translation
High-Fidelity Simultaneous Speech-To-Speech Translation
Tom Labiausse
Laurent Mazaré
Edouard Grave
P. Pérez
Alexandre Défossez
Neil Zeghidour
410
1
0
05 Feb 2025
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
SoundSpring: Loss-Resilient Audio Transceiver with Dual-Functional Masked Language Modeling
Shengshi Yao
Jincheng Dai
Xiaoqi Qin
Sixian Wang
Siye Wang
K. Niu
Ping Zhang
104
0
0
22 Jan 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
116
2
0
10 Jan 2025
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
SSR-Speech: Towards Stable, Safe and Robust Zero-shot Text-based Speech Editing and Synthesis
Helin Wang
Meng Yu
Jiarui Hai
Chen Chen
Yuchen Hu
Rilin Chen
Najim Dehak
Dong Yu
103
6
0
03 Jan 2025
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
134
7
0
02 Dec 2024
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
SAFREE: Training-Free and Adaptive Guard for Safe Text-to-Image And Video Generation
Jaehong Yoon
Shoubin Yu
Vaidehi Patil
Huaxiu Yao
Joey Tianyi Zhou
117
22
0
16 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
89
4
0
14 Oct 2024
Code Drift: Towards Idempotent Neural Audio Codecs
Code Drift: Towards Idempotent Neural Audio Codecs
P. O'Reilly
Prem Seetharaman
Jiaqi Su
Zeyu Jin
Bryan Pardo
357
0
0
14 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffM
VGen
LLMAG
86
4
0
04 Oct 2024
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
100
3
0
02 Oct 2024
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
AudioComposer: Towards Fine-grained Audio Generation with Natural Language Descriptions
Yun Wang
Hangting Chen
Dongchao Yang
Zhiyong Wu
Xixin Wu
DiffM
78
2
0
19 Sep 2024
Learning Source Disentanglement in Neural Audio Codec
Learning Source Disentanglement in Neural Audio Codec
Xiaoyu Bie
Xubo Liu
Gaël Richard
50
2
0
17 Sep 2024
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
WavTokenizer: an Efficient Acoustic Discrete Codec Tokenizer for Audio Language Modeling
Shengpeng Ji
Ziyue Jiang
Xize Cheng
Yifu Chen
Minghui Fang
...
Rongjie Huang
Yidi Jiang
Qian Chen
Zhou Zhao
Zhou Zhao
VLM
100
42
0
29 Aug 2024
ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
ICSD: An Open-source Dataset for Infant Cry and Snoring Detection
Qingyu Liu
Longfei Song
Dongxing Xu
Yanhua Long
66
0
0
20 Aug 2024
Video-to-Audio Generation with Hidden Alignment
Video-to-Audio Generation with Hidden Alignment
Manjie Xu
Chenxing Li
Yong Ren
Rilin Chen
Yu Gu
Yu Gu
Dong Yu
Dong Yu
DiffM
VGen
68
12
0
10 Jul 2024
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
We Have a Package for You! A Comprehensive Analysis of Package Hallucinations by Code Generating LLMs
Joseph Spracklen
Raveen Wijewickrama
A. H. M. N. Sakib
Anindya Maiti
Murtuza Jadliwala
Murtuza Jadliwala
97
12
0
12 Jun 2024
MAD Speech: Measures of Acoustic Diversity of Speech
MAD Speech: Measures of Acoustic Diversity of Speech
Matthieu Futeral
A. Agostinelli
Marco Tagliasacchi
Neil Zeghidour
Eugene Kharitonov
97
1
0
16 Apr 2024
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
PITCH: AI-assisted Tagging of Deepfake Audio Calls using Challenge-Response
Govind Mittal
Arthur Jakobsson
Kelly O. Marshall
Chinmay Hegde
Nasir Memon
75
0
0
28 Feb 2024
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
JEN-1: Text-Guided Universal Music Generation with Omnidirectional Diffusion Models
Peike Li
Bo-Yu Chen
Yao Yao
Yikai Wang
Allen Wang
Alex Jinpeng Wang
MGen
VLM
DiffM
104
39
0
09 Aug 2023
AudioLM: a Language Modeling Approach to Audio Generation
AudioLM: a Language Modeling Approach to Audio Generation
Zalan Borsos
Raphaël Marinier
Damien Vincent
Eugene Kharitonov
Olivier Pietquin
...
Dominik Roblek
O. Teboul
David Grangier
Marco Tagliasacchi
Neil Zeghidour
AuLLM
141
606
0
07 Sep 2022
Classifier-Free Diffusion Guidance
Classifier-Free Diffusion Guidance
Jonathan Ho
Tim Salimans
FaML
190
3,885
0
26 Jul 2022
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Diffsound: Discrete Diffusion Model for Text-to-sound Generation
Dongchao Yang
Jianwei Yu
Helin Wang
Wen Wang
Chao Weng
Yuexian Zou
Dong Yu
DiffM
83
304
0
20 Jul 2022
Masked Autoencoders that Listen
Masked Autoencoders that Listen
Po-Yao (Bernie) Huang
Hu Xu
Juncheng Billy Li
Alexei Baevski
Michael Auli
Wojciech Galuba
Florian Metze
Christoph Feichtenhofer
74
280
0
13 Jul 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
178
1,117
0
22 Jun 2022
Photorealistic Text-to-Image Diffusion Models with Deep Language
  Understanding
Photorealistic Text-to-Image Diffusion Models with Deep Language Understanding
Chitwan Saharia
William Chan
Saurabh Saxena
Lala Li
Jay Whang
...
Raphael Gontijo-Lopes
Tim Salimans
Jonathan Ho
David J Fleet
Mohammad Norouzi
VLM
404
6,012
0
23 May 2022
OPT: Open Pre-trained Transformer Language Models
OPT: Open Pre-trained Transformer Language Models
Susan Zhang
Stephen Roller
Naman Goyal
Mikel Artetxe
Moya Chen
...
Daniel Simig
Punit Singh Koura
Anjali Sridhar
Tianlu Wang
Luke Zettlemoyer
VLM
OSLM
AI4CE
327
3,655
0
02 May 2022
Hierarchical Text-Conditional Image Generation with CLIP Latents
Hierarchical Text-Conditional Image Generation with CLIP Latents
Aditya A. Ramesh
Prafulla Dhariwal
Alex Nichol
Casey Chu
Mark Chen
VLM
DiffM
377
6,859
0
13 Apr 2022
Enhanced Direct Speech-to-Speech Translation Using Self-supervised
  Pre-training and Data Augmentation
Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation
Sravya Popuri
Peng-Jen Chen
Changhan Wang
J. Pino
Yossi Adi
Jiatao Gu
Wei-Ning Hsu
Ann Lee
72
58
0
06 Apr 2022
Generative Spoken Dialogue Language Modeling
Generative Spoken Dialogue Language Modeling
Tu Nguyen
Eugene Kharitonov
Jade Copet
Yossi Adi
Wei-Ning Hsu
...
Paden Tomasello
Robin Algayres
Benoît Sagot
Abdel-rahman Mohamed
Emmanuel Dupoux
AuLLM
90
88
0
30 Mar 2022
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Make-A-Scene: Scene-Based Text-to-Image Generation with Human Priors
Oran Gafni
Adam Polyak
Oron Ashual
Shelly Sheynin
Devi Parikh
Yaniv Taigman
DiffM
65
520
0
24 Mar 2022
textless-lib: a Library for Textless Spoken Language Processing
textless-lib: a Library for Textless Spoken Language Processing
Eugene Kharitonov
Jade Copet
Kushal Lakhotia
Tu Nguyen
Paden Tomasello
...
A. Elkahky
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
106
33
0
15 Feb 2022
High-Resolution Image Synthesis with Latent Diffusion Models
High-Resolution Image Synthesis with Latent Diffusion Models
Robin Rombach
A. Blattmann
Dominik Lorenz
Patrick Esser
Bjorn Ommer
3DV
410
15,486
0
20 Dec 2021
GLIDE: Towards Photorealistic Image Generation and Editing with
  Text-Guided Diffusion Models
GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models
Alex Nichol
Prafulla Dhariwal
Aditya A. Ramesh
Pranav Shyam
Pamela Mishkin
Bob McGrew
Ilya Sutskever
Mark Chen
334
3,600
0
20 Dec 2021
Textless Speech-to-Speech Translation on Real Data
Textless Speech-to-Speech Translation on Real Data
Ann Lee
Hongyu Gong
Paul-Ambroise Duquenne
Holger Schwenk
Peng-Jen Chen
...
Sravya Popuri
Yossi Adi
J. Pino
Jiatao Gu
Wei-Ning Hsu
64
147
0
15 Dec 2021
Textless Speech Emotion Conversion using Discrete and Decomposed
  Representations
Textless Speech Emotion Conversion using Discrete and Decomposed Representations
Felix Kreuk
Adam Polyak
Jade Copet
Eugene Kharitonov
Tu Nguyen
M. Rivière
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
Yossi Adi
63
33
0
14 Nov 2021
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech
  Processing
WavLM: Large-Scale Self-Supervised Pre-Training for Full Stack Speech Processing
Sanyuan Chen
Chengyi Wang
Zhengyang Chen
Yu-Huan Wu
Shujie Liu
...
Yao Qian
Jian Wu
Micheal Zeng
Xiangzhan Yu
Furu Wei
SSL
239
1,857
0
26 Oct 2021
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language
  Processing
SpeechT5: Unified-Modal Encoder-Decoder Pre-Training for Spoken Language Processing
Junyi Ao
Rui Wang
Long Zhou
Chengyi Wang
Shuo Ren
...
Yu Zhang
Zhihua Wei
Yao Qian
Jinyu Li
Furu Wei
138
202
0
14 Oct 2021
Efficient Training of Audio Transformers with Patchout
Efficient Training of Audio Transformers with Patchout
Khaled Koutini
Jan Schluter
Hamid Eghbalzadeh
Gerhard Widmer
ViT
126
257
0
11 Oct 2021
Text-Free Prosody-Aware Generative Spoken Language Modeling
Text-Free Prosody-Aware Generative Spoken Language Modeling
Eugene Kharitonov
Ann Lee
Adam Polyak
Yossi Adi
Jade Copet
...
Tu Nguyen
M. Rivière
Abdel-rahman Mohamed
Emmanuel Dupoux
Wei-Ning Hsu
60
121
0
07 Sep 2021
Direct speech-to-speech translation with discrete units
Direct speech-to-speech translation with discrete units
Ann Lee
Peng-Jen Chen
Changhan Wang
Jiatao Gu
Sravya Popuri
...
Yossi Adi
Qing He
Yun Tang
J. Pino
Wei-Ning Hsu
62
188
0
12 Jul 2021
SoundStream: An End-to-End Neural Audio Codec
SoundStream: An End-to-End Neural Audio Codec
Neil Zeghidour
Alejandro Luebs
Ahmed Omran
Jan Skoglund
Marco Tagliasacchi
AI4TS
105
792
0
07 Jul 2021
HuBERT: Self-Supervised Speech Representation Learning by Masked
  Prediction of Hidden Units
HuBERT: Self-Supervised Speech Representation Learning by Masked Prediction of Hidden Units
Wei-Ning Hsu
Benjamin Bolte
Yao-Hung Hubert Tsai
Kushal Lakhotia
Ruslan Salakhutdinov
Abdel-rahman Mohamed
SSL
163
2,949
0
14 Jun 2021
Speech Resynthesis from Discrete Disentangled Self-Supervised
  Representations
Speech Resynthesis from Discrete Disentangled Self-Supervised Representations
Adam Polyak
Yossi Adi
Jade Copet
Eugene Kharitonov
Kushal Lakhotia
Wei-Ning Hsu
Abdel-rahman Mohamed
Emmanuel Dupoux
80
318
0
01 Apr 2021
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
GAN Vocoder: Multi-Resolution Discriminator Is All You Need
J. You
Dalhyun Kim
Gyuhyeon Nam
Geumbyeol Hwang
Gyeongsu Chae
46
27
0
09 Mar 2021
Generative Spoken Language Modeling from Raw Audio
Generative Spoken Language Modeling from Raw Audio
Kushal Lakhotia
Evgeny Kharitonov
Wei-Ning Hsu
Yossi Adi
Adam Polyak
...
Tu Nguyen
Jade Copet
Alexei Baevski
A. Mohamed
Emmanuel Dupoux
AuLLM
246
358
0
01 Feb 2021
UniSpeech: Unified Speech Representation Learning with Labeled and
  Unlabeled Data
UniSpeech: Unified Speech Representation Learning with Labeled and Unlabeled Data
Chengyi Wang
Yu-Huan Wu
Yao Qian
K. Kumatani
Shujie Liu
Furu Wei
Michael Zeng
Xuedong Huang
OT
SSL
73
115
0
19 Jan 2021
12
Next