ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2305.09690
  4. Cited By
A Whisper transformer for audio captioning trained with synthetic
  captions and transfer learning

A Whisper transformer for audio captioning trained with synthetic captions and transfer learning

15 May 2023
Marek Kadlcík
Adam Hájek
Jürgen Kieslich
Radoslaw Winiecki
    VLM
ArXivPDFHTML

Papers citing "A Whisper transformer for audio captioning trained with synthetic captions and transfer learning"

8 / 8 papers shown
Title
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Enhancing Automated Audio Captioning via Large Language Models with
  Optimized Audio Encoding
Enhancing Automated Audio Captioning via Large Language Models with Optimized Audio Encoding
Jizhong Liu
Gang Li
Junbo Zhang
Heinrich Dinkel
Yongqing Wang
Zhiyong Yan
Yujun Wang
Bin Wang
AuLLM
62
2
0
19 Jun 2024
Understanding Sounds, Missing the Questions: The Challenge of Object
  Hallucination in Large Audio-Language Models
Understanding Sounds, Missing the Questions: The Challenge of Object Hallucination in Large Audio-Language Models
Chun-Yi Kuan
Wei-Ping Huang
Hung-yi Lee
AuLLM
31
7
0
12 Jun 2024
LLark: A Multimodal Instruction-Following Language Model for Music
LLark: A Multimodal Instruction-Following Language Model for Music
Josh Gardner
Simon Durand
Daniel Stoller
Rachel M. Bittner
AuLLM
31
14
0
11 Oct 2023
RECAP: Retrieval-Augmented Audio Captioning
RECAP: Retrieval-Augmented Audio Captioning
Sreyan Ghosh
Sonal Kumar
Chandra Kiran Reddy Evuru
R. Duraiswami
Tianyi Zhou
VLM
70
17
0
18 Sep 2023
Diffusion models for audio semantic communication
Diffusion models for audio semantic communication
Eleonora Grassucci
Christian Marinoni
Andrea Rodriguez
Danilo Comminiello
DiffM
19
23
0
13 Sep 2023
Zero-Shot Audio Captioning via Audibility Guidance
Zero-Shot Audio Captioning via Audibility Guidance
Tal Shaharabany
Ariel Shaulov
Lior Wolf
28
4
0
07 Sep 2023
CoNeTTE: An efficient Audio Captioning system leveraging multiple
  datasets with Task Embedding
CoNeTTE: An efficient Audio Captioning system leveraging multiple datasets with Task Embedding
Etienne Labbé
Thomas Pellegrini
J. Pinquier
30
12
0
01 Sep 2023
1