Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.09503
Cited By
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
12 October 2024
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs"
17 / 17 papers shown
Title
OmniFlow: Any-to-Any Generation with Multi-Modal Rectified Flows
Shufan Li
Konstantinos Kallidromitis
Akash Gokul
Zichun Liao
Yusuke Kato
Kazuki Kozuka
Aditya Grover
VGen
166
9
0
02 Dec 2024
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
143
220
0
30 Mar 2023
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
CLIP
131
540
0
12 Nov 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
231
3,158
0
20 Oct 2022
Can Audio Captions Be Evaluated with Image Caption Metrics?
Zelin Zhou
Zhiling Zhang
Xuenan Xu
Zeyu Xie
Mengyue Wu
Kenny Q. Zhu
64
46
0
10 Oct 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
502
10,526
0
17 Jun 2021
GigaSpeech: An Evolving, Multi-domain ASR Corpus with 10,000 Hours of Transcribed Audio
Guoguo Chen
Shuzhou Chai
Guan-Bo Wang
Jiayu Du
Weiqiang Zhang
...
Xuchen Yao
Yongqing Wang
Yujun Wang
Zhao You
Zhiyong Yan
116
385
0
13 Jun 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
682
41,483
0
22 Oct 2020
The NTT DCASE2020 Challenge Task 6 system: Automated Audio Captioning with Keywords and Sentence Length Estimation
Yuma Koizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
K. Kashino
92
23
0
01 Jul 2020
PANNs: Large-Scale Pretrained Audio Neural Networks for Audio Pattern Recognition
Qiuqiang Kong
Yin Cao
Turab Iqbal
Yuxuan Wang
Wenwu Wang
Mark D. Plumbley
VLM
SSL
199
1,084
0
21 Dec 2019
Clotho: An Audio Captioning Dataset
Konstantinos Drossos
Samuel Lipping
Tuomas Virtanen
109
395
0
21 Oct 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
689
24,557
0
26 Jul 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.8K
95,229
0
11 Oct 2018
Improved Image Captioning via Policy Gradient optimization of SPIDEr
Siqi Liu
Zhenhai Zhu
Ning Ye
S. Guadarrama
Kevin Patrick Murphy
168
446
0
01 Dec 2016
SPICE: Semantic Propositional Image Caption Evaluation
Peter Anderson
Basura Fernando
Mark Johnson
Stephen Gould
EGVM
108
1,919
0
29 Jul 2016
Improving Neural Machine Translation Models with Monolingual Data
Rico Sennrich
Barry Haddow
Alexandra Birch
261
2,724
0
20 Nov 2015
CIDEr: Consensus-based Image Description Evaluation
Ramakrishna Vedantam
C. L. Zitnick
Devi Parikh
306
4,511
0
20 Nov 2014
1