ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.06687
  4. Cited By
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
v1v2v3v4 (latest)

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

12 November 2022
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
    CLIP
ArXiv (abs)PDFHTML

Papers citing "Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation"

33 / 383 papers shown
Title
Bridging High-Quality Audio and Video via Language for Sound Effects
  Retrieval from Visual Queries
Bridging High-Quality Audio and Video via Language for Sound Effects Retrieval from Visual Queries
J. Wilkins
Justin Salamon
Magdalena Fuentes
J. P. Bello
Oriol Nieto
CLIP
55
5
0
17 Aug 2023
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised
  Pretraining
AudioLDM 2: Learning Holistic Audio Generation with Self-supervised Pretraining
Haohe Liu
Yiitan Yuan
Xubo Liu
Xinhao Mei
Qiuqiang Kong
Qiao Tian
Yuping Wang
Wenwu Wang
Yuxuan Wang
Mark D. Plumbley
DiffM
130
246
0
10 Aug 2023
Separate Anything You Describe
Separate Anything You Describe
Xubo Liu
Qiuqiang Kong
Yan Zhao
Haohe Liu
Yiitan Yuan
Yuzhuo Liu
Rui Xia
Yuxuan Wang
Mark D. Plumbley
Wenwu Wang
VLM
97
52
0
09 Aug 2023
Transferable Models for Bioacoustics with Human Language Supervision
Transferable Models for Bioacoustics with Human Language Supervision
David Robinson
Adelaide Robinson
Lily Akrapongpisak
74
8
0
09 Aug 2023
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using
  Beat-Synchronous Mixup Strategies
MusicLDM: Enhancing Novelty in Text-to-Music Generation Using Beat-Synchronous Mixup Strategies
Kai Chen
Yusong Wu
Haohe Liu
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
DiffM
92
81
0
03 Aug 2023
LP-MusicCaps: LLM-Based Pseudo Music Captioning
LP-MusicCaps: LLM-Based Pseudo Music Captioning
Seungheon Doh
Keunwoo Choi
Jongpil Lee
Juhan Nam
70
82
0
31 Jul 2023
A Demand-Driven Perspective on Generative Audio AI
A Demand-Driven Perspective on Generative Audio AI
Sangshin Oh
Minsung Kang
Hyeongi Moon
Keunwoo Choi
Ben Sangbae Chon
57
3
0
10 Jul 2023
DISCO-10M: A Large-Scale Music Dataset
DISCO-10M: A Large-Scale Music Dataset
Luca A. Lanzendörfer
Florian Grötschla
Emil Funke
Roger Wattenhofer
70
14
0
23 Jun 2023
A Multimodal Prototypical Approach for Unsupervised Sound Classification
A Multimodal Prototypical Approach for Unsupervised Sound Classification
Saksham Singh Kushwaha
Magdalena Fuentes
116
10
0
21 Jun 2023
Text-Driven Foley Sound Generation With Latent Diffusion Model
Text-Driven Foley Sound Generation With Latent Diffusion Model
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Peipei Wu
Mark D.Plumbley
Wenwu Wang
DiffM
108
10
0
17 Jun 2023
FALL-E: A Foley Sound Synthesis Model and Strategies
FALL-E: A Foley Sound Synthesis Model and Strategies
Minsung Kang
Sangshin Oh
Hyeongi Moon
Kyungyun Lee
Ben Sangbae Chon
51
4
0
16 Jun 2023
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained
  Language-Vision Models
CLIPSonic: Text-to-Audio Synthesis with Unlabeled Videos and Pretrained Language-Vision Models
Hao-Wen Dong
Xiaoyu Liu
Jordi Pons
Gautam Bhattacharya
Santiago Pascual
Joan Serrà
Taylor Berg-Kirkpatrick
Julian McAuley
DiffM
86
20
0
16 Jun 2023
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio
  Pretraining for Accurate Speech Emotion Recognition
GEmo-CLAP: Gender-Attribute-Enhanced Contrastive Language-Audio Pretraining for Accurate Speech Emotion Recognition
Yu Pan
Yanni Hu
Yuguang Yang
Wen Fei
Jixun Yao
Heng Lu
Lei Ma
Jianjun Zhao
VLM
99
12
0
13 Jun 2023
Simple and Controllable Music Generation
Simple and Controllable Music Generation
Jade Copet
Felix Kreuk
Itai Gat
Tal Remez
David Kant
Gabriel Synnaeve
Yossi Adi
Alexandre Défossez
MGen
147
377
0
08 Jun 2023
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language
  Perspective
Revisit Weakly-Supervised Audio-Visual Video Parsing from the Language Perspective
Yingying Fan
Yu Wu
Bo Du
Yutian Lin
70
9
0
01 Jun 2023
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and
  Dataset
VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Sihan Chen
Handong Li
Qunbo Wang
Zijia Zhao
Ming-Ting Sun
Xinxin Zhu
Qingbin Liu
189
112
0
29 May 2023
Adapting Language-Audio Models as Few-Shot Audio Learners
Adapting Language-Audio Models as Few-Shot Audio Learners
Jinhua Liang
Xubo Liu
Haohe Liu
Huy P Phan
Emmanouil Benetos
Mark D. Plumbley
Wenwu Wang
VLM
107
18
0
28 May 2023
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event
  Parser
Modality-Independent Teachers Meet Weakly-Supervised Audio-Visual Event Parser
Yun-hsuan Lai
Yen-Chun Chen
Y. Wang
50
12
0
27 May 2023
Latent Diffusion Model Based Foley Sound Generation System For DCASE
  Challenge 2023 Task 7
Latent Diffusion Model Based Foley Sound Generation System For DCASE Challenge 2023 Task 7
Yiitan Yuan
Haohe Liu
Xubo Liu
Xiyuan Kang
Mark D.Plumbley
Wenwu Wang
35
9
0
25 May 2023
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
DiffAVA: Personalized Text-to-Audio Generation with Visual Alignment
Shentong Mo
Jing Shi
Yapeng Tian
63
17
0
22 May 2023
Connecting Multi-modal Contrastive Representations
Connecting Multi-modal Contrastive Representations
Zehan Wang
Yang Zhao
Xize Cheng
Haifeng Huang
Jiageng Liu
...
Lin Li
Yongqiang Wang
Aoxiong Yin
Ziang Zhang
Zhou Zhao
58
25
0
22 May 2023
Pengi: An Audio Language Model for Audio Tasks
Pengi: An Audio Language Model for Audio Tasks
Soham Deshmukh
Benjamin Elizalde
Rita Singh
Huaming Wang
MLLMAuLLM
97
182
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLMMLLMObjD
148
122
0
18 May 2023
Listen, Think, and Understand
Listen, Think, and Understand
Yuan Gong
Hongyin Luo
Alexander H. Liu
Leonid Karlinsky
James R. Glass
ELMMLLMLRM
126
161
0
18 May 2023
Unsupervised Improvement of Audio-Text Cross-Modal Representations
Unsupervised Improvement of Audio-Text Cross-Modal Representations
Zhepei Wang
Cem Subakan
Krishna Subramani
Junkai Wu
Tiago Tavares
Fabio Ayres
Paris Smaragdis
SSL
81
2
0
03 May 2023
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic
  Music Information Retrieval
CLaMP: Contrastive Language-Music Pre-training for Cross-Modal Symbolic Music Information Retrieval
Shangda Wu
Dingyao Yu
Xu Tan
Maosong Sun
CLIPVLM
76
15
0
21 Apr 2023
Looking Similar, Sounding Different: Leveraging Counterfactual
  Cross-Modal Pairs for Audiovisual Representation Learning
Looking Similar, Sounding Different: Leveraging Counterfactual Cross-Modal Pairs for Audiovisual Representation Learning
Nikhil Singh
Chih-Wei Wu
Iroro Orife
Mahdi M. Kalayeh
85
2
0
12 Apr 2023
On Robustness in Multimodal Learning
On Robustness in Multimodal Learning
Brandon McKinzie
Joseph Cheng
Vaishaal Shankar
Yinfei Yang
Jonathon Shlens
Alexander Toshev
59
2
0
10 Apr 2023
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for
  Audio-Language Multimodal Research
WavCaps: A ChatGPT-Assisted Weakly-Labelled Audio Captioning Dataset for Audio-Language Multimodal Research
Xinhao Mei
Chutong Meng
Haohe Liu
Qiuqiang Kong
Tom Ko
Chengqi Zhao
Mark D. Plumbley
Yuexian Zou
Wenwu Wang
175
220
0
30 Mar 2023
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
AudioLDM: Text-to-Audio Generation with Latent Diffusion Models
Haohe Liu
Zehua Chen
Yiitan Yuan
Xinhao Mei
Xubo Liu
Danilo Mandic
Wenwu Wang
Mark D. Plumbley
DiffM
177
508
0
29 Jan 2023
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Moûsai: Text-to-Music Generation with Long-Context Latent Diffusion
Flavio Schneider
Ojasv Kamal
Zhijing Jin
Bernhard Schölkopf
MGen
109
84
0
27 Jan 2023
MAViL: Masked Audio-Video Learners
MAViL: Masked Audio-Video Learners
Po-Yao (Bernie) Huang
Vasu Sharma
Hu Xu
Chaitanya K. Ryali
Haoqi Fan
Yanghao Li
Shang-Wen Li
Gargi Ghosh
Jitendra Malik
Christoph Feichtenhofer
77
54
0
15 Dec 2022
TimbreCLIP: Connecting Timbre to Text and Images
TimbreCLIP: Connecting Timbre to Text and Images
Nicolas Jonason
Bob L. T. Sturm
CLIP
94
4
0
21 Nov 2022
Previous
12345678