ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.05284
  4. Cited By
Simple and Controllable Music Generation

Simple and Controllable Music Generation

8 June 2023
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
    MGen
ArXivPDFHTML

Papers citing "Simple and Controllable Music Generation"

50 / 257 papers shown
Title
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech
  Representation from Self-supervised Learning Model
MMM: Multi-Layer Multi-Residual Multi-Stream Discrete Speech Representation from Self-supervised Learning Model
Jiatong Shi
Xutai Ma
Hirofumi Inaguma
Anna Y. Sun
Shinji Watanabe
60
7
0
14 Jun 2024
Can Synthetic Audio From Generative Foundation Models Assist Audio
  Recognition and Speech Modeling?
Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling?
Tiantian Feng
Dimitrios Dimitriadis
Shrikanth Narayanan
45
4
0
13 Jun 2024
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion
  Models
Diff-A-Riff: Musical Accompaniment Co-creation via Latent Diffusion Models
J. Nistal
Marco Pasini
Cyran Aouameur
M. Grachten
Stefan Lattner
DiffM
53
16
0
12 Jun 2024
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Codecfake: An Initial Dataset for Detecting LLM-based Deepfake Audio
Yi Lu
Yuankun Xie
Ruibo Fu
Zhengqi Wen
Jianhua Tao
...
Xuefei Liu
Yongwei Li
Yukun Liu
Xiaopeng Wang
Shuchen Shi
56
1
0
12 Jun 2024
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via
  Monotonic Alignment
VALL-E R: Robust and Efficient Zero-Shot Text-to-Speech Synthesis via Monotonic Alignment
Bing Han
Long Zhou
Shujie Liu
Sanyuan Chen
Lingwei Meng
Yanming Qian
Yanqing Liu
Sheng Zhao
Jinyu Li
Furu Wei
49
15
0
12 Jun 2024
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
The Interspeech 2024 Challenge on Speech Processing Using Discrete Units
Xuankai Chang
Jiatong Shi
Jinchuan Tian
Yuning Wu
Yuxun Tang
Yihan Wu
Shinji Watanabe
Yossi Adi
Xie Chen
Qin Jin
49
16
0
11 Jun 2024
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from
  Codec-Based Speech Synthesis Systems
CodecFake: Enhancing Anti-Spoofing Models Against Deepfake Audios from Codec-Based Speech Synthesis Systems
Haibin Wu
Yuan Tseng
Hung-yi Lee
AuLLM
35
7
0
11 Jun 2024
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Autoregressive Diffusion Transformer for Text-to-Speech Synthesis
Zhijun Liu
Shuai Wang
Sho Inoue
Qibing Bai
Haizhou Li
DiffM
50
15
0
08 Jun 2024
MeLFusion: Synthesizing Music from Image and Language Cues using
  Diffusion Models
MeLFusion: Synthesizing Music from Image and Language Cues using Diffusion Models
Sanjoy Chowdhury
Sayan Nag
K. J. Joseph
Balaji Vasan Srinivasan
Dinesh Manocha
DiffM
48
7
0
07 Jun 2024
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
SEE-2-SOUND: Zero-Shot Spatial Environment-to-Spatial Sound
Rishit Dagli
Shivesh Prakash
Robert Wu
H. Khosravani
39
3
0
06 Jun 2024
Small-E: Small Language Model with Linear Attention for Efficient Speech
  Synthesis
Small-E: Small Language Model with Linear Attention for Efficient Speech Synthesis
Théodor Lemerle
Nicolas Obin
Axel Roebel
42
6
0
06 Jun 2024
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling
Zeyue Tian
Zhaoyang Liu
Ruibin Yuan
Jiahao Pan
Xiaoqiang Huang
Xu Tan
Xu Tan
Qifeng Chen
Yu Guo
VGen
104
16
0
06 Jun 2024
Text-to-Events: Synthetic Event Camera Streams from Conditional Text
  Input
Text-to-Events: Synthetic Event Camera Streams from Conditional Text Input
Joachim Ott
Zuowen Wang
Shih-Chii Liu
DiffM
VGen
60
0
0
05 Jun 2024
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive
  Modeling of Audio Discrete Codes
LiveSpeech: Low-Latency Zero-shot Text-to-Speech via Autoregressive Modeling of Audio Discrete Codes
Trung D. Q. Dang
David Aponte
Dung Tran
K. Koishida
40
4
0
05 Jun 2024
An Independence-promoting Loss for Music Generation with Language Models
An Independence-promoting Loss for Music Generation with Language Models
Jean-Marie Lemercier
Simon Rouard
Jade Copet
Yossi Adi
Alexandre Défossez
30
1
0
04 Jun 2024
MidiCaps: A large-scale MIDI dataset with text captions
MidiCaps: A large-scale MIDI dataset with text captions
J. Melechovský
Abhinaba Roy
Dorien Herremans
32
10
0
04 Jun 2024
MaskSR: Masked Language Model for Full-band Speech Restoration
MaskSR: Masked Language Model for Full-band Speech Restoration
Xu Li
Qirui Wang
Xiaoyu Liu
53
8
0
04 Jun 2024
A Survey of Deep Learning Audio Generation Methods
A Survey of Deep Learning Audio Generation Methods
Matej Bozic
Marko Horvat
VLM
MedIm
66
0
0
31 May 2024
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music
  Generation
DITTO-2: Distilled Diffusion Inference-Time T-Optimization for Music Generation
Zachary Novack
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
35
8
0
30 May 2024
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language
  Models via Instruction Tuning
Instruct-MusicGen: Unlocking Text-to-Music Editing for Music Language Models via Instruction Tuning
Yixiao Zhang
Yukara Ikemiya
Woosung Choi
Naoki Murata
Marco A. Martínez-Ramírez
Liwei Lin
Gus Xia
Wei-Hsiang Liao
Yuki Mitsufuji
Simon Dixon
57
10
0
28 May 2024
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
QA-MDT: Quality-aware Masked Diffusion Transformer for Enhanced Music Generation
Chang Li
Ruoyu Wang
Lijuan Liu
Jun Du
Yixuan Sun
Zilu Guo
Zhenrong Zhang
Yuan Jiang
J. Gao
Feng Ma
43
0
0
24 May 2024
DAC-JAX: A JAX Implementation of the Descript Audio Codec
DAC-JAX: A JAX Implementation of the Descript Audio Codec
David Braun
34
0
0
19 May 2024
Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded
  Diffusion Models
Whole-Song Hierarchical Generation of Symbolic Music Using Cascaded Diffusion Models
Ziyu Wang
Lejun Min
Gus Xia
DiffM
31
10
0
16 May 2024
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment
  Generation
FastSAG: Towards Fast Non-Autoregressive Singing Accompaniment Generation
Jianyi Chen
Wei Xue
Xu Tan
Zhen Ye
Qi-fei Liu
Yi-Ting Guo
55
2
0
13 May 2024
Correlation Dimension of Natural Language in a Statistical Manifold
Correlation Dimension of Natural Language in a Statistical Manifold
Xin Du
Kumiko Tanaka-Ishii
26
1
0
10 May 2024
The Codecfake Dataset and Countermeasures for the Universally Detection
  of Deepfake Audio
The Codecfake Dataset and Countermeasures for the Universally Detection of Deepfake Audio
Yuankun Xie
Yi Lu
Ruibo Fu
Zhengqi Wen
Zhiyong Wang
...
Xiaopeng Wang
Yukun Liu
Haonan Cheng
Long Ye
Yi Sun
52
16
0
08 May 2024
Detecting music deepfakes is easy but actually hard
Detecting music deepfakes is easy but actually hard
Darius Afchar
Gabriel Meseguer-Brocal
Romain Hennequin
68
6
0
07 May 2024
Creative Problem Solving in Large Language and Vision Models -- What
  Would it Take?
Creative Problem Solving in Large Language and Vision Models -- What Would it Take?
Lakshmi Nair
Evana Gizzi
Jivko Sinapov
MLLM
57
3
0
02 May 2024
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General
  Sound
SemantiCodec: An Ultra Low Bitrate Semantic Audio Codec for General Sound
Haohe Liu
Xuenan Xu
Yiitan Yuan
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
48
19
0
30 Apr 2024
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
ComposerX: Multi-Agent Symbolic Music Composition with LLMs
Qixin Deng
Qikai Yang
Ruibin Yuan
Yipeng Huang
Yi Wang
...
Emmanouil Benetos
Wenwu Wang
Guangyu Xia
Wei Xue
Yi-Ting Guo
LLMAG
46
30
0
28 Apr 2024
Semantically consistent Video-to-Audio Generation using Multimodal
  Language Large Model
Semantically consistent Video-to-Audio Generation using Multimodal Language Large Model
Gehui Chen
Guan’an Wang
Xiaowen Huang
Jitao Sang
VGen
35
8
0
25 Apr 2024
Long-form music generation with latent diffusion
Long-form music generation with latent diffusion
Zach Evans
Julian Parker
CJ Carr
Zack Zukowski
Josiah Taylor
Jordi Pons
MGen
DiffM
44
40
0
16 Apr 2024
MuPT: A Generative Symbolic Music Pretrained Transformer
MuPT: A Generative Symbolic Music Pretrained Transformer
Xingwei Qu
Yuelin Bai
Yi Ma
Ziya Zhou
Ka Man Lo
...
Xu Tan
Stephen W. Huang
Wenhu Chen
Jie Fu
Ge Zhang
57
11
0
09 Apr 2024
Gull: A Generative Multifunctional Audio Codec
Gull: A Generative Multifunctional Audio Codec
Yi Luo
Jianwei Yu
Hangting Chen
Rongzhi Gu
Chao Weng
AuLLM
46
3
0
07 Apr 2024
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot
  Text-to-Speech
CLaM-TTS: Improving Neural Codec Language Model for Zero-Shot Text-to-Speech
Jaehyeon Kim
Keon Lee
Seungjun Chung
Jaewoong Cho
74
42
0
03 Apr 2024
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and
  Action Recognition in Drone Imagery
Leveraging YOLO-World and GPT-4V LMMs for Zero-Shot Person Detection and Action Recognition in Drone Imagery
Christian Limberg
Artur Gonçalves
Bastien Rigault
Helmut Prendinger
40
6
0
02 Apr 2024
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
SMITIN: Self-Monitored Inference-Time INtervention for Generative Music Transformers
Junghyun Koo
Gordon Wichern
François Germain
Sameer Khurana
Jonathan Le Roux
39
3
0
02 Apr 2024
Synthetic training set generation using text-to-audio models for
  environmental sound classification
Synthetic training set generation using text-to-audio models for environmental sound classification
Francesca Ronchini
Luca Comanducci
Fabio Antonacci
39
2
0
26 Mar 2024
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
VoiceCraft: Zero-Shot Speech Editing and Text-to-Speech in the Wild
Puyuan Peng
Po-Yao (Bernie) Huang
Daniel Li
Abdelrahman Mohamed
David Harwath
74
64
0
25 Mar 2024
Generalized Multi-Source Inference for Text Conditioned Music Diffusion
  Models
Generalized Multi-Source Inference for Text Conditioned Music Diffusion Models
Emilian Postolache
Giorgio Mariani
Luca Cosmo
Emmanouil Benetos
Emanuele Rodolà
DiffM
48
9
0
18 Mar 2024
Symbiotic Game and Foundation Models for Cyber Deception Operations in
  Strategic Cyber Warfare
Symbiotic Game and Foundation Models for Cyber Deception Operations in Strategic Cyber Warfare
Tao Li
Quanyan Zhu
AAML
42
5
0
14 Mar 2024
AesopAgent: Agent-driven Evolutionary System on Story-to-Video
  Production
AesopAgent: Agent-driven Evolutionary System on Story-to-Video Production
Jiuniu Wang
Zehua Du
Yuyuan Zhao
Bo Yuan
Kexiang Wang
...
Yihen Lu
Gengliang Li
Junlong Gao
Xin Tu
Zhenyu Guo
LLMAG
VGen
41
7
0
12 Mar 2024
MAP-Elites with Transverse Assessment for Multimodal Problems in
  Creative Domains
MAP-Elites with Transverse Assessment for Multimodal Problems in Creative Domains
Marvin Zammit
Antonios Liapis
Georgios N. Yannakakis
34
1
0
11 Mar 2024
Beyond Language Models: Byte Models are Digital World Simulators
Beyond Language Models: Byte Models are Digital World Simulators
Shangda Wu
Xu Tan
Zili Wang
Rui Wang
Xiaobing Li
Maosong Sun
35
12
0
29 Feb 2024
SongComposer: A Large Language Model for Lyric and Melody Composition in
  Song Generation
SongComposer: A Large Language Model for Lyric and Melody Composition in Song Generation
Shuangrui Ding
Zihan Liu
Xiao-wen Dong
Pan Zhang
Rui Qian
Conghui He
Dahua Lin
Jiaqi Wang
30
23
0
27 Feb 2024
D-Flow: Differentiating through Flows for Controlled Generation
D-Flow: Differentiating through Flows for Controlled Generation
Heli Ben-Hamu
Omri Puny
Itai Gat
Brian Karrer
Uriel Singer
Y. Lipman
46
26
0
21 Feb 2024
Towards audio language modeling -- an overview
Towards audio language modeling -- an overview
Haibin Wu
Xuanjun Chen
Yi-Cheng Lin
Kai-Wei Chang
Ho-Lam Chung
Alexander H. Liu
Hung-yi Lee
AuLLM
40
28
0
20 Feb 2024
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
AnyGPT: Unified Multimodal LLM with Discrete Sequence Modeling
Jun Zhan
Junqi Dai
Jiasheng Ye
Yunhua Zhou
Dong Zhang
...
Jie Fu
Tao Gui
Tianxiang Sun
Yugang Jiang
Xipeng Qiu
MLLM
35
121
0
19 Feb 2024
MuChin: A Chinese Colloquial Description Benchmark for Evaluating
  Language Models in the Field of Music
MuChin: A Chinese Colloquial Description Benchmark for Evaluating Language Models in the Field of Music
Zihao Wang
Shuyu Li
Tao Zhang
Qi Wang
Pengfei Yu
Jinyang Luo
Yan Liu
Ming Xi
Kejun Zhang
47
4
0
15 Feb 2024
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation
  and Editing via Content-based Controls
Arrange, Inpaint, and Refine: Steerable Long-term Music Audio Generation and Editing via Content-based Controls
Liwei Lin
Gus Xia
Yixiao Zhang
Junyan Jiang
29
12
0
14 Feb 2024
Previous
123456
Next