ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2205.05949
  4. Cited By
Automated Audio Captioning: An Overview of Recent Progress and New
  Challenges
v1v2 (latest)

Automated Audio Captioning: An Overview of Recent Progress and New Challenges

12 May 2022
Xinhao Mei
Xubo Liu
Mark D. Plumbley
Wenwu Wang
ArXiv (abs)PDFHTML

Papers citing "Automated Audio Captioning: An Overview of Recent Progress and New Challenges"

39 / 39 papers shown
Title
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
166
3
0
10 Jan 2025
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Improving Audio-Text Retrieval via Hierarchical Cross-Modal Interaction and Auxiliary Captions
Yifei Xin
Yuexian Zou
102
9
0
28 Jul 2023
Interactive Audio-text Representation for Automated Audio Captioning
  with Contrastive Learning
Interactive Audio-text Representation for Automated Audio Captioning with Contrastive Learning
Chen Chen
Nana Hou
Yuchen Hu
Heqing Zou
Xiaofeng Qi
Chng Eng Siong
VLM
64
21
0
29 Mar 2022
Joint Speech Recognition and Audio Captioning
Joint Speech Recognition and Audio Captioning
Chaitanya Narisetty
E. Tsunoo
Xuankai Chang
Yosuke Kashiwagi
Michael Hentschel
Shinji Watanabe
42
10
0
03 Feb 2022
Audio Retrieval with Natural Language Queries: A Benchmark Study
Audio Retrieval with Natural Language Queries: A Benchmark Study
A. Sophia Koepke
Andreea-Maria Oncescu
João F. Henriques
Zeynep Akata
Samuel Albanie
69
102
0
17 Dec 2021
Wav2CLIP: Learning Robust Audio Representations From CLIP
Wav2CLIP: Learning Robust Audio Representations From CLIP
Ho-Hsiang Wu
Prem Seetharaman
Kundan Kumar
J. P. Bello
CLIPVLM
145
273
0
21 Oct 2021
Evaluating Off-the-Shelf Machine Listening and Natural Language Models
  for Automated Audio Captioning
Evaluating Off-the-Shelf Machine Listening and Natural Language Models for Automated Audio Captioning
Benno Weck
Xavier Favory
Konstantinos Drossos
Xavier Serra
54
8
0
14 Oct 2021
Improving the Performance of Automated Audio Captioning via Integrating
  the Acoustic and Semantic Information
Improving the Performance of Automated Audio Captioning via Integrating the Acoustic and Semantic Information
Zhongjie Ye
Helin Wang
Dongchao Yang
Yuexian Zou
83
28
0
12 Oct 2021
Can Audio Captions Be Evaluated with Image Caption Metrics?
Can Audio Captions Be Evaluated with Image Caption Metrics?
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
64
46
0
10 Oct 2021
Automated Audio Captioning using Transfer Learning and Reconstruction
  Latent Space Similarity Regularization
Automated Audio Captioning using Transfer Learning and Reconstruction Latent Space Similarity Regularization
Andrew Koh
Fuzhao Xue
Chng Eng Siong
51
20
0
10 Aug 2021
An Encoder-Decoder Based Audio Captioning System With Transfer and
  Reinforcement Learning
An Encoder-Decoder Based Audio Captioning System With Transfer and Reinforcement Learning
Xinhao Mei
Qiushi Huang
Xubo Liu
Gengyun Chen
Jingqian Wu
...
Tom Ko
H. Tang
Xingkun Shao
Mark D. Plumbley
Wenwu Wang
77
54
0
05 Aug 2021
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic
  Scene Classification
SpecAugment++: A Hidden Space Data Augmentation Method for Acoustic Scene Classification
Helin Wang
Yuexian Zou
Wenwu Wang
56
44
0
31 Mar 2021
Effects of Word-frequency based Pre- and Post- Processings for Audio
  Captioning
Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning
Daiki Takeuchi
Yuma Koizumi
Yasunori Ohishi
Noboru Harada
K. Kashino
59
27
0
24 Sep 2020
Multi-task Regularization Based on Infrequent Classes for Audio
  Captioning
Multi-task Regularization Based on Infrequent Classes for Audio Captioning
Emre Çakir
Konstantinos Drossos
Tuomas Virtanen
54
17
0
09 Jul 2020
A Transformer-based Audio Captioning Model with Keyword Estimation
A Transformer-based Audio Captioning Model with Keyword Estimation
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
76
54
0
01 Jul 2020
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal
  Transformer
A Better Use of Audio-Visual Cues: Dense Video Captioning with Bi-modal Transformer
Vladimir E. Iashin
Esa Rahtu
76
130
0
17 May 2020
Conformer: Convolution-augmented Transformer for Speech Recognition
Conformer: Convolution-augmented Transformer for Speech Recognition
Anmol Gulati
James Qin
Chung-Cheng Chiu
Niki Parmar
Yu Zhang
...
Wei Han
Shibo Wang
Zhengdong Zhang
Yonghui Wu
Ruoming Pang
229
3,160
0
16 May 2020
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Oscar: Object-Semantics Aligned Pre-training for Vision-Language Tasks
Xiujun Li
Xi Yin
Chunyuan Li
Pengchuan Zhang
Xiaowei Hu
...
Houdong Hu
Li Dong
Furu Wei
Yejin Choi
Jianfeng Gao
VLM
144
1,947
0
13 Apr 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern
  Recognition
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLMSSL
199
1,084
0
21 Dec 2019
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
216
12,136
0
13 Nov 2019
Clotho: An Audio Captioning Dataset
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
109
395
0
21 Oct 2019
Unified Vision-Language Pre-Training for Image Captioning and VQA
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLMVLM
357
945
0
24 Sep 2019
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
Nils Reimers
Iryna Gurevych
1.3K
12,316
0
27 Aug 2019
Masked Non-Autoregressive Image Captioning
Masked Non-Autoregressive Image Captioning
Junlong Gao
Xi Meng
Shiqi Wang
Xia Li
Shanshe Wang
Siwei Ma
Wen Gao
75
37
0
03 Jun 2019
BERTScore: Evaluating Text Generation with BERT
BERTScore: Evaluating Text Generation with BERT
Tianyi Zhang
Varsha Kishore
Felix Wu
Kilian Q. Weinberger
Yoav Artzi
363
5,872
0
21 Apr 2019
Audio Caption: Listen and Tell
Audio Caption: Listen and Tell
Mengyue Wu
Heinrich Dinkel
Kai Yu
66
61
0
25 Feb 2019
An Attempt towards Interpretable Audio-Visual Video Captioning
An Attempt towards Interpretable Audio-Visual Video Captioning
Yapeng Tian
Chenxiao Guan
Justin Goodman
Marc Moore
Chenliang Xu
75
20
0
07 Dec 2018
A multi-device dataset for urban acoustic scene classification
A multi-device dataset for urban acoustic scene classification
A. Mesaros
Toni Heittola
Tuomas Virtanen
45
381
0
25 Jul 2018
Towards Diverse and Natural Image Descriptions via a Conditional GAN
Towards Diverse and Natural Image Descriptions via a Conditional GAN
Bo Dai
Sanja Fidler
R. Urtasun
Dahua Lin
GAN
94
454
0
17 Mar 2017
Self-critical Sequence Training for Image Captioning
Self-critical Sequence Training for Image Captioning
Steven J. Rennie
E. Marcheret
Youssef Mroueh
Jerret Ross
Vaibhava Goel
109
1,892
0
02 Dec 2016
Improved Image Captioning via Policy Gradient optimization of SPIDEr
Improved Image Captioning via Policy Gradient optimization of SPIDEr
Siqi Liu
Zhenhai Zhu
Ning Ye
S. Guadarrama
Kevin Patrick Murphy
168
446
0
01 Dec 2016
CNN Architectures for Large-Scale Audio Classification
CNN Architectures for Large-Scale Audio Classification
Shawn Hershey
Sourish Chaudhuri
D. Ellis
J. Gemmeke
A. Jansen
...
Rif A. Saurous
Bryan Seybold
M. Slaney
Ron J. Weiss
K. Wilson
130
2,510
0
29 Sep 2016
SPICE: Semantic Propositional Image Caption Evaluation
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,919
0
29 Jul 2016
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence
  Modeling
Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling
Junyoung Chung
Çağlar Gülçehre
Kyunghyun Cho
Yoshua Bengio
607
12,745
0
11 Dec 2014
CIDEr: Consensus-based Image Description Evaluation
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
306
4,511
0
20 Nov 2014
Acoustic Scene Classification
Acoustic Scene Classification
D. Barchiesi
D. Giannoulis
D. Stowell
Mark D. Plumbley
165
406
0
13 Nov 2014
Sequence to Sequence Learning with Neural Networks
Sequence to Sequence Learning with Neural Networks
Ilya Sutskever
Oriol Vinyals
Quoc V. Le
AIMat
450
20,606
0
10 Sep 2014
Distributed Representations of Words and Phrases and their
  Compositionality
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAIOCL
404
33,573
0
16 Oct 2013
Sequence Transduction with Recurrent Neural Networks
Sequence Transduction with Recurrent Neural Networks
Alex Graves
195
1,872
0
14 Nov 2012
1