A Whisper transformer for audio captioning trained with synthetic captions and transfer learning

15 May 2023

Papers citing "A Whisper transformer for audio captioning trained with synthetic captions and transfer learning"

8 / 8 papers shown

Title
Audio-Language Datasets of Scenes and Events: A Survey Gijs Wijngaard Elia Formisano Michele Esposito M. Dumontier 81 2 0 10 Jan 2025
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding Jizhong Liu Gang Li Junbo Zhang Heinrich Dinkel Yongqing Wang Zhiyong Yan Yujun Wang Bin Wang AuLLM 62 2 0 19 Jun 2024
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models Chun-Yi Kuan Wei-Ping Huang Hung-yi Lee AuLLM 31 7 0 12 Jun 2024
LLark: A Multimodal Instruction-Following Language Model for Music Josh Gardner Simon Durand Daniel Stoller Rachel M. Bittner AuLLM 31 14 0 11 Oct 2023
RECAP: Retrieval-Augmented Audio Captioning Sreyan Ghosh Sonal Kumar Chandra Kiran Reddy Evuru R. Duraiswami Tianyi Zhou VLM 70 17 0 18 Sep 2023
Diffusion models for audio semantic communication Eleonora Grassucci Christian Marinoni Andrea Rodriguez Danilo Comminiello DiffM 19 23 0 13 Sep 2023
Zero-Shot Audio Captioning via Audibility Guidance Tal Shaharabany Ariel Shaulov Lior Wolf 28 4 0 07 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding Etienne Labbé Thomas Pellegrini J. Pinquier 30 12 0 01 Sep 2023