ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2407.04416
  4. Cited By
Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions

Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions

3 January 2025
Yi Yuan
Dongya Jia
Xiaobin Zhuang
Yuanzhe Chen
Zhengxi Liu
Zhuo Chen
Yuping Wang
Yansen Wang
Xubo Liu
Xiyuan Kang
Mark D. Plumbley
Wenwu Wang
    VLM
ArXivPDFHTML

Papers citing "Sound-VECaps: Improving Audio Generation with Visual Enhanced Captions"

3 / 3 papers shown
Title
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Synthio: Augmenting Small-Scale Audio Classification Datasets with Synthetic Data
Sreyan Ghosh
Sonal Kumar
Zhifeng Kong
Rafael Valle
Bryan Catanzaro
Dinesh Manocha
DiffM
49
2
0
02 Oct 2024
Text-to-Audio Generation using Instruction-Tuned LLM and Latent
  Diffusion Model
Text-to-Audio Generation using Instruction-Tuned LLM and Latent Diffusion Model
Deepanway Ghosal
Navonil Majumder
Ambuj Mehrish
Soujanya Poria
152
144
0
24 Apr 2023
Zero-Shot Text-to-Image Generation
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,796
0
24 Feb 2021
1