ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.09387
  4. Cited By
Clotho: An Audio Captioning Dataset

Clotho: An Audio Captioning Dataset

21 October 2019
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
ArXiv (abs)PDFHTML

Papers citing "Clotho: An Audio Captioning Dataset"

19 / 269 papers shown
Title
Continual Learning for Automated Audio Captioning Using The Learning
  Without Forgetting Approach
Continual Learning for Automated Audio Captioning Using The Learning Without Forgetting Approach
Jan van den Berg
Konstantinos Drossos
CLL
73
11
0
16 Jul 2021
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
MultiBench: Multiscale Benchmarks for Multimodal Representation Learning
Paul Pu Liang
Yiwei Lyu
Xiang Fan
Zetian Wu
Yun Cheng
...
Peter Wu
Michelle A. Lee
Yuke Zhu
Ruslan Salakhutdinov
Louis-Philippe Morency
VLM
111
172
0
15 Jul 2021
Audio Captioning with Composition of Acoustic and Semantic Information
Audio Captioning with Composition of Acoustic and Semantic Information
Aysegül Özkaya Eren
M. Sert
63
3
0
13 May 2021
Audio Retrieval with Natural Language Queries
Audio Retrieval with Natural Language Queries
Andreea-Maria Oncescu
A. Sophia Koepke
João F. Henriques
Zeynep Akata
Samuel Albanie
63
79
0
05 May 2021
Quantifying and Maximizing the Benefits of Back-End Noise Adaption on
  Attention-Based Speech Recognition Models
Quantifying and Maximizing the Benefits of Back-End Noise Adaption on Attention-Based Speech Recognition Models
Coleman Hooper
Thierry Tambe
Gu-Yeon Wei
39
0
0
03 May 2021
AMSS-Net: Audio Manipulation on User-Specified Sources with Textual
  Queries
AMSS-Net: Audio Manipulation on User-Specified Sources with Textual Queries
Woosung Choi
Minseok Kim
Marco A. Martínez-Ramírez
Jaehwa Chung
Soonyoung Jung
61
6
0
28 Apr 2021
Towards Citizen Science for Smart Cities: A Framework for a
  Collaborative Game of Bird Call Recognition Based on Internet of Sound
  Practices
Towards Citizen Science for Smart Cities: A Framework for a Collaborative Game of Bird Call Recognition Based on Internet of Sound Practices
Emmanuel Rovithis
N. Moustakas
Konstantinos Vogklis
Konstantinos Drossos
Andreas Floros
26
4
0
31 Mar 2021
Investigating Local and Global Information for Automated Audio
  Captioning with Transfer Learning
Investigating Local and Global Information for Automated Audio Captioning with Transfer Learning
Xuenan Xu
Heinrich Dinkel
Mengyue Wu
Zeyu Xie
Kai Yu
77
60
0
23 Feb 2021
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by
  Audio-based Similar Caption Retrieval
Audio Captioning using Pre-Trained Large-Scale Language Model Guided by Audio-based Similar Caption Retrieval
Yuma Koizumi
Yasunori Ohishi
Daisuke Niizumi
Daiki Takeuchi
Masahiro Yasuda
74
41
0
14 Dec 2020
WaveTransformer: A Novel Architecture for Audio Captioning Based on
  Learning Temporal and Time-Frequency Information
WaveTransformer: A Novel Architecture for Audio Captioning Based on Learning Temporal and Time-Frequency Information
An Tran
Konstantinos Drossos
Tuomas Virtanen
106
19
0
21 Oct 2020
Effects of Word-frequency based Pre- and Post- Processings for Audio
  Captioning
Effects of Word-frequency based Pre- and Post- Processings for Audio Captioning
Daiki Takeuchi
Yuma Koizumi
Yasunori Ohishi
Noboru Harada
K. Kashino
77
27
0
24 Sep 2020
RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound
  Synthesis
RWCP-SSD-Onomatopoeia: Onomatopoeic Word Dataset for Environmental Sound Synthesis
Yuki Okamoto
Keisuke Imoto
Shinnosuke Takamichi
Ryosuke Yamanishi
Takahiro Fukumori
Y. Yamashita
56
5
0
09 Jul 2020
Multi-task Regularization Based on Infrequent Classes for Audio
  Captioning
Multi-task Regularization Based on Infrequent Classes for Audio Captioning
Emre Çakir
Konstantinos Drossos
Tuomas Virtanen
62
17
0
09 Jul 2020
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal
  Shuffled Transformers
Dynamic Graph Representation Learning for Video Dialog via Multi-Modal Shuffled Transformers
Shijie Geng
Peng Gao
Moitreya Chatterjee
Chiori Hori
Jonathan Le Roux
Yongfeng Zhang
Hongsheng Li
A. Cherian
101
11
0
08 Jul 2020
Temporal Sub-sampling of Audio Feature Sequences for Automated Audio
  Captioning
Temporal Sub-sampling of Audio Feature Sequences for Automated Audio Captioning
K. Nguyen
Konstantinos Drossos
Tuomas Virtanen
57
12
0
06 Jul 2020
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning
  with Keywords and Sentence Length Estimation
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
Yuma Koizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
115
23
0
01 Jul 2020
A Transformer-based Audio Captioning Model with Keyword Estimation
A Transformer-based Audio Captioning Model with Keyword Estimation
Yuma Koizumi
Ryo Masumura
Kyosuke Nishida
Masahiro Yasuda
Shoichiro Saito
114
54
0
01 Jul 2020
Listen carefully and tell: an audio captioning system based on residual
  learning and gammatone audio representation
Listen carefully and tell: an audio captioning system based on residual learning and gammatone audio representation
Sergi Perez-Castanos
Javier Naranjo-Alcazar
P. Zuccarello
M. Cobos
70
11
0
27 Jun 2020
Audio Captioning using Gated Recurrent Units
Audio Captioning using Gated Recurrent Units
Aysegül Özkaya Eren
M. Sert
74
10
0
05 Jun 2020
Previous
123456