ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2303.17489
  4. Cited By
Prefix tuning for automated audio captioning

Prefix tuning for automated audio captioning

30 March 2023
Minkyu Kim
Kim Sung-Bin
Tae-Hyun Oh
ArXivPDFHTML

Papers citing "Prefix tuning for automated audio captioning"

33 / 33 papers shown
Title
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Unbiased Sliced Wasserstein Kernels for High-Quality Audio Captioning
Manh Luong
Khai Nguyen
Dinh Q. Phung
Gholamreza Haffari
Lizhen Qu
47
0
0
08 Feb 2025
Audio-Language Datasets of Scenes and Events: A Survey
Audio-Language Datasets of Scenes and Events: A Survey
Gijs Wijngaard
Elia Formisano
Michele Esposito
M. Dumontier
81
2
0
10 Jan 2025
Classifier-Guided Captioning Across Modalities
Ariel Shaulov
Tal Shaharabany
E. Shaar
Gal Chechik
Lior Wolf
33
0
0
03 Jan 2025
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal
  Latent Alignment
Sound2Vision: Generating Diverse Visuals from Audio through Cross-Modal Latent Alignment
Kim Sung-Bin
Arda Senocak
Hyunwoo Ha
Tae-Hyun Oh
DiffM
80
0
0
09 Dec 2024
Construction and Analysis of Impression Caption Dataset for
  Environmental Sounds
Construction and Analysis of Impression Caption Dataset for Environmental Sounds
Yuki Okamoto
Ryotaro Nagase
Minami Okamoto
Yuki Saito
Keisuke Imoto
Takahiro Fukumori
Y. Yamashita
26
0
0
20 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
28
2
0
12 Oct 2024
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Towards Diverse and Efficient Audio Captioning via Diffusion Models
Manjie Xu
Chenxing Li
Xinyi Tu
Yong Ren
Ruibo Fu
Wei Liang
Dong Yu
DiffM
49
1
0
14 Sep 2024
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio
  Captioning Performance
EnCLAP++: Analyzing the EnCLAP Framework for Optimizing Automated Audio Captioning Performance
Jaeyeon Kim
Minjeon Jeon
Jaeyoon Jung
Sang Hoon Woo
Jinjoo Lee
28
2
0
02 Sep 2024
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio
  Captioning
Expanding on EnCLAP with Auxiliary Retrieval Model for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Minjeong Jeon
Sang Hoon Woo
Jinjoo Lee
24
1
0
02 Sep 2024
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Efficient Audio Captioning with Encoder-Level Knowledge Distillation
Xuenan Xu
Haohe Liu
Mengyue Wu
Wenwu Wang
Mark D. Plumbley
45
1
0
19 Jul 2024
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
AVCap: Leveraging Audio-Visual Features as Text Tokens for Captioning
Jongsuk Kim
Jiwon Shin
Junmo Kim
41
1
0
10 Jul 2024
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
Speech Prefix-Tuning with RNNT Loss for Improving LLM Predictions
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Neeraj Gaur
Zhong Meng
28
3
0
20 Jun 2024
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible
  Acoustic Reception and Reaction
Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction
Haoqiu Yan
Yongxin Zhu
Kai Zheng
Bing Liu
Haoyu Cao
Deqiang Jiang
Linli Xu
AuLLM
38
4
0
18 Jun 2024
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Zero-Shot Audio Captioning Using Soft and Hard Prompts
Yiming Zhang
Xuenan Xu
Ruoyi Du
Haohe Liu
Yuan Dong
Zheng-Hua Tan
Wenwu Wang
Zhanyu Ma
VLM
35
4
0
10 Jun 2024
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Uni-MoE: Scaling Unified Multimodal LLMs with Mixture of Experts
Yunxin Li
Shenyuan Jiang
Baotian Hu
Longyue Wang
Wanqi Zhong
Wenhan Luo
Lin Ma
Min-Ling Zhang
MoE
46
28
0
18 May 2024
A Multimodal Approach to Device-Directed Speech Detection with Large
  Language Models
A Multimodal Approach to Device-Directed Speech Detection with Large Language Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
49
6
0
21 Mar 2024
EDTC: enhance depth of text comprehension in automated audio captioning
EDTC: enhance depth of text comprehension in automated audio captioning
Liwen Tan
Yin Cao
Yi Zhou
39
0
0
27 Feb 2024
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and
  Instruction Tuning
MINT: Boosting Audio-Language Model via Multi-Target Pre-Training and Instruction Tuning
Hang Zhao
Yifei Xin
Zhesong Yu
Bilei Zhu
Lu Lu
Zejun Ma
AuLLM
28
4
0
12 Feb 2024
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for
  Automated Audio Captioning
EnCLAP: Combining Neural Audio Codec and Audio-Text Joint Embedding for Automated Audio Captioning
Jaeyeon Kim
Jaeyoon Jung
Jinjoo Lee
Sang Hoon Woo
CLIP
VLM
25
21
0
31 Jan 2024
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
FunnyNet-W: Multimodal Learning of Funny Moments in Videos in the Wild
Zhi-Song Liu
Robin Courant
Vicky Kalogeiton
42
6
0
08 Jan 2024
Multimodal Data and Resource Efficient Device-Directed Speech Detection
  with Large Foundation Models
Multimodal Data and Resource Efficient Device-Directed Speech Detection with Large Foundation Models
Dominik Wagner
Alexander W. Churchill
Siddharth Sigtia
Panayiotis Georgiou
Matt Mirsamadi
Aarshee Mishra
Erik Marchi
17
3
0
06 Dec 2023
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware
  representations to LLMs and Emergent Cross-modal Reasoning
X-InstructBLIP: A Framework for aligning X-Modal instruction-aware representations to LLMs and Emergent Cross-modal Reasoning
Artemis Panagopoulou
Le Xue
Ning Yu
Junnan Li
Dongxu Li
Shafiq R. Joty
Ran Xu
Silvio Savarese
Caiming Xiong
Juan Carlos Niebles
VLM
MLLM
41
45
0
30 Nov 2023
Weakly-supervised Automated Audio Captioning via text only training
Weakly-supervised Automated Audio Captioning via text only training
Theodoros Kouzelis
V. Katsouros
CLIP
32
6
0
21 Sep 2023
RECAP: Retrieval-Augmented Audio Captioning
RECAP: Retrieval-Augmented Audio Captioning
Sreyan Ghosh
Sonal Kumar
Chandra Kiran Reddy Evuru
R. Duraiswami
Tianyi Zhou
VLM
70
17
0
18 Sep 2023
Training Audio Captioning Models without Audio
Training Audio Captioning Models without Audio
Soham Deshmukh
Benjamin Elizalde
Dimitra Emmanouilidou
Bhiksha Raj
Rita Singh
Huaming Wang
24
18
0
14 Sep 2023
Natural Language Supervision for General-Purpose Audio Representations
Natural Language Supervision for General-Purpose Audio Representations
Benjamin Elizalde
Soham Deshmukh
Huaming Wang
AuLLM
AI4TS
24
53
0
11 Sep 2023
Separate Anything You Describe
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
30
43
0
09 Aug 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLM
AuLLM
34
158
0
19 May 2023
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Sound to Visual Scene Generation by Audio-to-Visual Latent Alignment
Kim Sung-Bin
Arda Senocak
H. Ha
Andrew Owens
Tae-Hyun Oh
DiffM
VGen
38
36
0
30 Mar 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
43
193
0
30 Mar 2023
Retrieving Multimodal Information for Augmented Generation: A Survey
Retrieving Multimodal Information for Augmented Generation: A Survey
Ruochen Zhao
Hailin Chen
Weishi Wang
Fangkai Jiao
Do Xuan Long
...
Bosheng Ding
Xiaobao Guo
Minzhi Li
Xingxuan Li
Shafiq R. Joty
28
80
0
20 Mar 2023
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges
  in Audio Captioning
Beyond the Status Quo: A Contemporary Survey of Advances and Challenges in Audio Captioning
Xuenan Xu
Zeyu Xie
Mengyue Wu
K. Yu
34
13
0
11 May 2022
Bayesian Transformer Language Models for Speech Recognition
Bayesian Transformer Language Models for Speech Recognition
Boyang Xue
Jianwei Yu
Junhao Xu
Shansong Liu
Shoukang Hu
Zi Ye
Mengzhe Geng
Xunying Liu
Helen Meng
BDL
76
26
0
09 Feb 2021
1