Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2407.04416
Cited By
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions
3 January 2025
Yi Yuan
Dongya Jia
Xiaobin Zhuang
Yuanzhe Chen
Zhengxi Liu
Zhuo Chen
Yuping Wang
Yansen Wang
Xubo Liu
Xiyuan Kang
Mark D. Plumbley
Wenwu Wang
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions"
3 / 3 papers shown
Title
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
49
2
0
02 Oct 2024
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
152
144
0
24 Apr 2023
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,796
0
24 Feb 2021
1