Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2007.00222
Cited By
v1
v2 (latest)
A Transformer-based Audio Captioning Model with Keyword Estimation
1 July 2020
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"A Transformer-based Audio Captioning Model with Keyword Estimation"
40 / 40 papers shown
Title
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
Jaeyeon Kim
Minjeon Jeon
Jaeyoon Jung
Sang Hoon Woo
Jinjoo Lee
80
3
0
02 Sep 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
88
1
0
02 Sep 2024
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
Jongsuk Kim
Jiwon Shin
Junmo Kim
128
3
0
10 Jul 2024
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
CLIP
VLM
72
25
0
31 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
125
7
0
08 Jan 2024
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
111
52
0
09 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMe
MLLM
126
46
0
30 Jul 2023
Improving Audio Caption Fluency with Automatic Error Correction
Hanxue Zhang
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
50
0
0
16 Jun 2023
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
Jianyuan Sun
Xubo Liu
Xinhao Mei
V. Kılıç
Mark D. Plumbley
Wenwu Wang
65
3
0
30 May 2023
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
64
8
0
07 Apr 2023
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
102
46
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
181
220
0
30 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
88
2
0
05 Dec 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
50
3
0
12 Nov 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
162
20
0
28 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
62
0
0
22 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
80
3
0
10 Oct 2022
Event-related data conditioning for acoustic event classification
Yuanbo Hou
Dick Botteldooren
59
3
0
16 Jun 2022
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
53
1
0
04 Jun 2022
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
69
6
0
17 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
113
44
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
84
16
0
11 May 2022
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
56
0
0
18 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
122
1
0
18 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
100
59
0
15 Apr 2022
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
75
0
0
28 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
98
11
0
10 Jan 2022
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
67
8
0
14 Oct 2021
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
GAN
110
28
0
13 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
101
28
0
12 Oct 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
93
54
0
05 Aug 2021
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
94
78
0
21 Jul 2021
Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach
Jan van den Berg
Konstantinos Drossos
CLL
73
11
0
16 Jul 2021
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
116
37
0
24 Apr 2021
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Zeyu Xie
Kai Yu
77
60
0
23 Feb 2021
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Yuma Koizumi
Yasunori Ohishi
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
74
41
0
14 Dec 2020
WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information
An Tran
Konstantinos Drossos
Tuomas Virtanen
106
19
0
21 Oct 2020
Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning
Daiki Takeuchi
Yuma Koizumi
Yasunori Ohishi
Noboru Harada
K. Kashino
77
27
0
24 Sep 2020
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
Yuma Koizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
115
23
0
01 Jul 2020
1