ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2007.00222
  4. Cited By
A Transformer-based Audio Captioning Model with Keyword Estimation
v1v2 (latest)

A Transformer-based Audio Captioning Model with Keyword Estimation

1 July 2020
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
ArXiv (abs)PDFHTML

Papers citing "A Transformer-based Audio Captioning Model with Keyword Estimation"

40 / 40 papers shown
Title
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio
  Captioning Performance
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
Jaeyeon Kim
Minjeon Jeon
Jaeyoon Jung
Sang Hoon Woo
Jinjoo Lee
80
3
0
02 Sep 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio
  Captioning
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
88
1
0
02 Sep 2024
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
Jongsuk Kim
Jiwon Shin
Junmo Kim
128
3
0
10 Jul 2024
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
  Automated Audio Captioning
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
CLIPVLM
72
25
0
31 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
125
7
0
08 Jan 2024
Separate Anything You Describe
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
111
52
0
09 Aug 2023
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
UnIVAL: Unified Model for Image, Video, Audio and Language Tasks
Mustafa Shukor
Corentin Dancette
Alexandre Ramé
Matthieu Cord
MoMeMLLM
126
46
0
30 Jul 2023
Improving Audio Caption Fluency with Automatic Error Correction
Improving Audio Caption Fluency with Automatic Error Correction
Hanxue Zhang
Zeyu Xie
Xuenan Xu
Mengyue Wu
K. Yu
50
0
0
16 Jun 2023
Dual Transformer Decoder based Features Fusion Network for Automated
  Audio Captioning
Dual Transformer Decoder based Features Fusion Network for Automated Audio Captioning
Jianyuan Sun
Xubo Liu
Xinhao Mei
V. Kılıç
Mark D. Plumbley
Wenwu Wang
65
3
0
30 May 2023
Graph Attention for Automated Audio Captioning
Graph Attention for Automated Audio Captioning
Feiyang Xiao
Jian Guan
Qiaoxi Zhu
Wenwu Wang
64
8
0
07 Apr 2023
Prefix tuning for automated audio captioning
Prefix tuning for automated audio captioning
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
102
46
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
181
220
0
30 Mar 2023
Towards Generating Diverse Audio Captions via Adversarial Training
Towards Generating Diverse Audio Captions via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffM
88
2
0
05 Dec 2022
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and
  Evaluating Suitability of Language-Centric Performance Metrics
Investigations in Audio Captioning: Addressing Vocabulary Imbalance and Evaluating Suitability of Language-Centric Performance Metrics
Sandeep Reddy Kothinti
Dimitra Emmanouilidou
50
3
0
12 Nov 2022
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Visually-Aware Audio Captioning With Adaptive Audio-Visual Attention
Xubo Liu
Qiushi Huang
Xinhao Mei
Haohe Liu
Qiuqiang Kong
...
Yu Zhang
Lilian H. Y. Tang
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
162
20
0
28 Oct 2022
GCT: Gated Contextual Transformer for Sequential Audio Tagging
GCT: Gated Contextual Transformer for Sequential Audio Tagging
Yuanbo Hou
Yun Wang
Wenwu Wang
Dick Botteldooren
62
0
0
22 Oct 2022
Automated Audio Captioning via Fusion of Low- and High- Dimensional
  Features
Automated Audio Captioning via Fusion of Low- and High- Dimensional Features
Jianyuan Sun
Xubo Liu
Xinhao Mei
Mark D. Plumbley
V. Kılıç
Wenwu Wang
80
3
0
10 Oct 2022
Event-related data conditioning for acoustic event classification
Event-related data conditioning for acoustic event classification
Yuanbo Hou
Dick Botteldooren
59
3
0
16 Jun 2022
Automated Audio Captioning with Epochal Difficult Captions for
  Curriculum Learning
Automated Audio Captioning with Epochal Difficult Captions for Curriculum Learning
Andrew Koh
Soham Dinesh Tiwari
Chng Eng Siong
53
1
0
04 Jun 2022
Composing General Audio Representation by Fusing Multilayer Features of
  a Pre-trained Model
Composing General Audio Representation by Fusing Multilayer Features of a Pre-trained Model
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
69
6
0
17 May 2022
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
Automated Audio Captioning: An Overview of Recent Progress and New Challenges
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
113
44
0
12 May 2022
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
84
16
0
11 May 2022
Automated Audio Captioning using Audio Event Clues
Automated Audio Captioning using Audio Event Clues
Aycsegul Ozkaya Eren
M. Sert
56
0
0
18 Apr 2022
Caption Feature Space Regularization for Audio Captioning
Caption Feature Space Regularization for Audio Captioning
Yiming Zhang
Hong Yu
Ruoyi Du
Zhanyu Ma
Yuan Dong
122
1
0
18 Apr 2022
BYOL for Audio: Exploring Pre-trained General-purpose Audio
  Representations
BYOL for Audio: Exploring Pre-trained General-purpose Audio Representations
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
SSL
100
59
0
15 Apr 2022
Leveraging Pre-trained BERT for Audio Captioning
Leveraging Pre-trained BERT for Audio Captioning
Xubo Liu
Xinhao Mei
Qiushi Huang
Jianyuan Sun
Jinzheng Zhao
Haohe Liu
Mark D. Plumbley
Volkan Kilicc
Wenwu Wang
115
30
0
06 Mar 2022
Automatic Audio Captioning using Attention weighted Event based
  Embeddings
Automatic Audio Captioning using Attention weighted Event based Embeddings
Swapnil Bhosale
Rupayan Chakraborty
Sunil Kumar Kopparapu
75
0
0
28 Jan 2022
Local Information Assisted Attention-free Decoder for Audio Captioning
Local Information Assisted Attention-free Decoder for Audio Captioning
Feiyang Xiao
Jian Guan
Haiyan Lan
Qiaoxi Zhu
Wenwu Wang
98
11
0
10 Jan 2022
Evaluating Off-the-Shelf Machine Listening and Natural Language Models
  for Automated Audio Captioning
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
67
8
0
14 Oct 2021
Diverse Audio Captioning via Adversarial Training
Diverse Audio Captioning via Adversarial Training
Xinhao Mei
Xubo Liu
Jianyuan Sun
Mark D. Plumbley
Wenwu Wang
DiffMGAN
110
28
0
13 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating
  the Acoustic and Semantic Information
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
101
28
0
12 Oct 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and
  Reinforcement Learning
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
93
54
0
05 Aug 2021
Audio Captioning Transformer
Audio Captioning Transformer
Xinhao Mei
Xubo Liu
Qiushi Huang
Mark D. Plumbley
Wenwu Wang
ViT
94
78
0
21 Jul 2021
Continual Learning for Automated Audio Captioning Using The Learning
  Without Forgetting Approach
Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach
Jan van den Berg
Konstantinos Drossos
CLL
73
11
0
16 Jul 2021
MusCaps: Generating Captions for Music Audio
MusCaps: Generating Captions for Music Audio
Ilaria Manco
Emmanouil Benetos
Elio Quinton
Gyorgy Fazekas
116
37
0
24 Apr 2021
Investigating Local and Global Information for Automated Audio
  Captioning with Transfer Learning
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Zeyu Xie
Kai Yu
77
60
0
23 Feb 2021
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by
  Audio-based Similar Caption Retrieval
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Yuma Koizumi
Yasunori Ohishi
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
74
41
0
14 Dec 2020
WaveTransformer: A Novel Architecture for Audio Captioning Based on
  Learning Temporal and Time-Frequency Information
WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information
An Tran
Konstantinos Drossos
Tuomas Virtanen
106
19
0
21 Oct 2020
Effects of Word-frequency based Pre- and Post- Processings for Audio
  Captioning
Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning
Daiki Takeuchi
Yuma Koizumi
Yasunori Ohishi
Noboru Harada
K. Kashino
77
27
0
24 Sep 2020
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning
  with Keywords and Sentence Length Estimation
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
Yuma Koizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
115
23
0
01 Jul 2020
1