Can Large Language Models Predict Audio Effects Parameters from Natural Language?

27 May 2025

Main:4 Pages

1 Figures

Bibliography:1 Pages

4 Tables

Appendix:3 Pages

Abstract

In music production, manipulating audio effects (Fx) parameters through natural language has the potential to reduce technical barriers for non-experts. We present LLM2Fx, a framework leveraging Large Language Models (LLMs) to predict Fx parameters directly from textual descriptions without requiring task-specific training or fine-tuning. Our approach address the text-to-effect parameter prediction (Text2Fx) task by mapping natural language descriptions to the corresponding Fx parameters for equalization and reverberation. We demonstrate that LLMs can generate Fx parameters in a zero-shot manner that elucidates the relationship between timbre semantics and audio effects in music production. To enhance performance, we introduce three types of in-context examples: audio Digital Signal Processing (DSP) features, DSP function code, and few-shot examples. Our results demonstrate that LLM-based Fx parameter generation outperforms previous optimization approaches, offering competitive performance in translating natural language descriptions to appropriate Fx settings. Furthermore, LLMs can serve as text-driven interfaces for audio production, paving the way for more intuitive and accessible music production tools.

View on arXiv

@article{doh2025_2505.20770,
  title={ Can Large Language Models Predict Audio Effects Parameters from Natural Language? },
  author={ Seungheon Doh and Junghyun Koo and Marco A. Martínez-Ramírez and Wei-Hsiang Liao and Juhan Nam and Yuki Mitsufuji },
  journal={arXiv preprint arXiv:2505.20770},
  year={ 2025 }
}

Comments on this paper