Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2409.01160
Cited By
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
2 September 2024
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning"
13 / 13 papers shown
Title
Improving Audio Captioning Models with Fine-grained Audio Features, Text Embedding Supervision, and LLM Mix-up Augmentation
Shih-Lun Wu
Xuankai Chang
Gordon Wichern
Jee-weon Jung
Franccois G. Germain
Jonathan Le Roux
Shinji Watanabe
59
20
0
29 Sep 2023
High-Fidelity Audio Compression with Improved RVQGAN
Rithesh Kumar
Prem Seetharaman
Alejandro Luebs
I. Kumar
Kundan Kumar
94
327
0
11 Jun 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
71
5
0
02 May 2023
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
66
45
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
117
211
0
30 Mar 2023
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
122
531
0
12 Nov 2022
AudioGen: Textually Guided Audio Generation
Felix Kreuk
Gabriel Synnaeve
Adam Polyak
Uriel Singer
Alexandre Défossez
Jade Copet
Devi Parikh
Yaniv Taigman
Yossi Adi
DiffM
82
308
0
30 Sep 2022
Can Audio Captions Be Evaluated with Image Caption Metrics?
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
55
46
0
10 Oct 2021
A Transformer-based Audio Captioning Model with Keyword Estimation
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
58
54
0
01 Jul 2020
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
98
389
0
21 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
662
24,464
0
26 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
94,891
0
11 Oct 2018
Microsoft COCO Captions: Data Collection and Evaluation Server
Xinlei Chen
Hao Fang
Nayeon Lee
Ramakrishna Vedantam
Saurabh Gupta
Piotr Dollar
C. L. Zitnick
215
2,478
0
01 Apr 2015
1