26
2

EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance

Jaeyeon Kim
Minjeon Jeon
Jaeyoon Jung
Sang Hoon Woo
Jinjoo Lee
Abstract

In this work, we aim to analyze and optimize the EnCLAP framework, a state-of-the-art model in automated audio captioning. We investigate the impact of modifying the acoustic encoder components, explore pretraining with different dataset scales, and study the effectiveness of a reranking scheme. Through extensive experimentation and quantitative analysis of generated captions, we develop EnCLAP++, an enhanced version that significantly surpasses the original.

View on arXiv
Comments on this paper