ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2211.06687
  4. Cited By
Large-scale Contrastive Language-Audio Pretraining with Feature Fusion
  and Keyword-to-Caption Augmentation
v1v2v3v4 (latest)

Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation

12 November 2022
Yusong Wu
Kai Chen
Tianyu Zhang
Yuchen Hui
Marianna Nezhurina
Taylor Berg-Kirkpatrick
Shlomo Dubnov
    CLIP
ArXiv (abs)PDFHTML

Papers citing "Large-scale Contrastive Language-Audio Pretraining with Feature Fusion and Keyword-to-Caption Augmentation"

50 / 383 papers shown
Title
Continuous Autoregressive Models with Noise Augmentation Avoid Error
  Accumulation
Continuous Autoregressive Models with Noise Augmentation Avoid Error Accumulation
Marco Pasini
J. Nistal
Stefan Lattner
George Fazekas
112
3
0
27 Nov 2024
State-Space Large Audio Language Models
State-Space Large Audio Language Models
Saurabhchand Bhati
Yuan Gong
Leonid Karlinsky
Hilde Kuehne
Rogerio Feris
James Glass
151
1
0
24 Nov 2024
MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity
  Prediction
MUFM: A Mamba-Enhanced Feedback Model for Micro Video Popularity Prediction
Jiacheng Lu
Mingyuan Xiao
Weijian Wang
Yuxin Du
Yi Cui
Jingnan Zhao
Cheng Hua
Mamba
99
1
0
23 Nov 2024
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Gotta Hear Them All: Sound Source Aware Vision to Audio Generation
Wei Guo
Heng Wang
Jianbo Ma
Weidong Cai
DiffM
173
5
0
23 Nov 2024
Towards Open-Vocabulary Audio-Visual Event Localization
Jinxing Zhou
Dan Guo
Ruohao Guo
Yuxin Mao
Jingjing Hu
Yiran Zhong
Xiaojun Chang
Ming Wang
VLM
127
5
0
18 Nov 2024
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
Any2Any: Incomplete Multimodal Retrieval with Conformal Prediction
Po-han Li
Yunhao Yang
Mohammad Omama
Sandeep Chinchali
Ufuk Topcu
61
2
0
15 Nov 2024
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
Past, Present, and Future of Sensor-Based Human Activity Recognition Using Wearables: A Surveying Tutorial on a Still Challenging Task
H. Haresamudram
Chi Ian Tang
Sungho Suh
P. Lukowicz
Thomas Ploetz
180
3
0
11 Nov 2024
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks
Long-Form Text-to-Music Generation with Adaptive Prompts: A Case Study in Tabletop Role-Playing Games Soundtracks
Felipe Marra
Lucas N. Ferreira
98
0
0
06 Nov 2024
Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations
Sing-On-Your-Beat: Simple Text-Controllable Accompaniment Generations
Quoc-Huy Trinh
Minh-Van Nguyen
Trong-Hieu Nguyen-Mau
Khoa Tran
Thanh Do
56
0
0
03 Nov 2024
Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection
Leveraging LLM and Text-Queried Separation for Noise-Robust Sound Event Detection
Han Yin
Yang Xiao
Jisheng Bai
Rohan Kumar Das
124
0
0
02 Nov 2024
Music Foundation Model as Generic Booster for Music Downstream Tasks
Music Foundation Model as Generic Booster for Music Downstream Tasks
Weihsiang Liao
Yuhta Takida
Yukara Ikemiya
Zhi-Wei Zhong
Chieh-Hsin Lai
...
Stefan Uhlich
Taketo Akama
Woosung Choi
Yuichiro Koyama
Yuki Mitsufuji
215
1
0
02 Nov 2024
Human-inspired Perspectives: A Survey on AI Long-term Memory
Human-inspired Perspectives: A Survey on AI Long-term Memory
Zihong He
Weizhe Lin
Hao Zheng
Fan Zhang
Matt Jones
Laurence Aitchison
X. Xu
Miao Liu
Per Ola Kristensson
Junxiao Shen
240
3
0
01 Nov 2024
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
MACE: Leveraging Audio for Evaluating Audio Captioning Systems
Satvik Dixit
Soham Deshmukh
Bhiksha Raj
59
1
0
01 Nov 2024
Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection
  and Captioning without Model Training
Retrieval-Augmented Approach for Unsupervised Anomalous Sound Detection and Captioning without Model Training
Ryoya Ogura
Tomoya Nishida
Yohei Kawaguchi
19
1
0
29 Oct 2024
ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time
  Optimization
ST-ITO: Controlling Audio Effects for Style Transfer with Inference-Time Optimization
C. Steinmetz
Shubhr Singh
Marco Comunità
Ilias Ibnyahya
Shanxin Yuan
Emmanouil Benetos
Joshua Reiss
76
9
0
28 Oct 2024
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
MMAU: A Massive Multi-Task Audio Understanding and Reasoning Benchmark
S. Sakshi
Utkarsh Tyagi
Sonal Kumar
Ashish Seth
Ramaneswaran Selvakumar
Oriol Nieto
R. Duraiswami
Sreyan Ghosh
Dinesh Manocha
AuLLMELM
126
46
0
24 Oct 2024
Gibberish is All You Need for Membership Inference Detection in
  Contrastive Language-Audio Pretraining
Gibberish is All You Need for Membership Inference Detection in Contrastive Language-Audio Pretraining
Ruoxi Cheng
Yizhong Ding
Shuirong Cao
Shitong Shao
Zhiqiang Wang
67
0
0
24 Oct 2024
Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement
Annotation-Free MIDI-to-Audio Synthesis via Concatenative Synthesis and Generative Refinement
Osamu Take
Taketo Akama
64
0
0
22 Oct 2024
Do Audio-Language Models Understand Linguistic Variations?
Do Audio-Language Models Understand Linguistic Variations?
Ramaneswaran Selvakumar
Sonal Kumar
Hemant Kumar Giri
Nishit Anand
Ashish Seth
Sreyan Ghosh
Dinesh Manocha
AuLLMVLM
140
1
0
21 Oct 2024
Construction and Analysis of Impression Caption Dataset for
  Environmental Sounds
Construction and Analysis of Impression Caption Dataset for Environmental Sounds
Yuki Okamoto
Ryotaro Nagase
Minami Okamoto
Yuki Saito
Keisuke Imoto
Takahiro Fukumori
Y. Yamashita
42
0
0
20 Oct 2024
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio
  Classification
PAT: Parameter-Free Audio-Text Aligner to Boost Zero-Shot Audio Classification
Ashish Seth
Ramaneswaran Selvakumar
Sonal Kumar
Sreyan Ghosh
Dinesh Manocha
VLM
56
0
0
19 Oct 2024
Multi-Source Spatial Knowledge Understanding for Immersive Visual
  Text-to-Speech
Multi-Source Spatial Knowledge Understanding for Immersive Visual Text-to-Speech
Shuwei He
Rui Liu
Hong Li
61
5
0
18 Oct 2024
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
CLaMP 2: Multimodal Music Information Retrieval Across 101 Languages Using Large Language Models
Shangda Wu
Yashan Wang
Ruibin Yuan
Zhancheng Guo
Xu Tan
...
Yuanliang Dong
Jiafeng Liu
Xiaobing Li
Feng Yu
Maosong Sun
213
5
0
17 Oct 2024
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data
  Generation
EmotionCaps: Enhancing Audio Captioning Through Emotion-Augmented Data Generation
Mithun Manivannan
Vignesh Nethrapalli
Mark Cartwright
62
1
0
15 Oct 2024
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent
  Approach
Enhancing Robustness in Deep Reinforcement Learning: A Lyapunov Exponent Approach
Rory Young
Nicolas Pugeault
AAML
134
5
0
14 Oct 2024
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and
  CLAP-Refine through LLMs
SLAM-AAC: Enhancing Audio Captioning with Paraphrasing Augmentation and CLAP-Refine through LLMs
Wenxi Chen
Ziyang Ma
Xiquan Li
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Kai Yu
Xie Chen
100
7
0
12 Oct 2024
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
DRCap: Decoding CLAP Latents with Retrieval-Augmented Generation for Zero-shot Audio Captioning
Xiquan Li
Wenxi Chen
Ziyang Ma
Xuenan Xu
Yuzhe Liang
Zhisheng Zheng
Qiuqiang Kong
Xie Chen
VLM
118
6
0
12 Oct 2024
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features
CSA: Data-efficient Mapping of Unimodal Features to Multimodal Features
Po-han Li
Sandeep Chinchali
Ufuk Topcu
96
2
0
10 Oct 2024
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
SRC-gAudio: Sampling-Rate-Controlled Audio Generation
Chenxing Li
Manjie Xu
Dong Yu
DiffM
48
0
0
09 Oct 2024
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image
  Captioner using Audiovisual Distribution Alignment
An Eye for an Ear: Zero-shot Audio Description Leveraging an Image Captioner using Audiovisual Distribution Alignment
Hugo Malard
Michel Olvera
Stéphane Lathuilière
S. Essid
VLM
61
0
0
08 Oct 2024
Presto! Distilling Steps and Layers for Accelerating Music Generation
Presto! Distilling Steps and Layers for Accelerating Music Generation
Cheng-i Wang
Ge Zhu
Jonah Casebeer
Julian McAuley
Taylor Berg-Kirkpatrick
Nicholas J. Bryan
126
7
0
07 Oct 2024
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Audio-Agent: Leveraging LLMs For Audio Generation, Editing and Composition
Zixuan Wang
Chi-Keung Tang
Chi-Keung Tang
DiffMVGenLLMAG
109
4
0
04 Oct 2024
Enriching Music Descriptions with a Finetuned-LLM and Metadata for
  Text-to-Music Retrieval
Enriching Music Descriptions with a Finetuned-LLM and Metadata for Text-to-Music Retrieval
Seungheon Doh
Minhee Lee
Dasaem Jeong
Juhan Nam
113
12
0
04 Oct 2024
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical
  Temporal Structure Augmentation
CoLLAP: Contrastive Long-form Language-Audio Pretraining with Musical Temporal Structure Augmentation
Junda Wu
Warren Li
Cheng-i Wang
Amit Namburi
Carol Chen
Julian McAuley
VLM
52
1
0
03 Oct 2024
Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset
Generating Symbolic Music from Natural Language Prompts using an LLM-Enhanced Dataset
Weihan Xu
Julian McAuley
Taylor Berg-Kirkpatrick
Shlomo Dubnov
Hao-Wen Dong
121
1
0
02 Oct 2024
Heterogeneous sound classification with the Broad Sound Taxonomy and
  Dataset
Heterogeneous sound classification with the Broad Sound Taxonomy and Dataset
Panagiota Anastasopoulou
Jessica Torrey
Xavier Serra
F. Font
38
1
0
01 Oct 2024
Language-based Audio Moment Retrieval
Language-based Audio Moment Retrieval
Hokuto Munakata
Taichi Nishimura
Shota Nakada
Tatsuya Komatsu
126
2
0
24 Sep 2024
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
A Comprehensive Survey with Critical Analysis for Deepfake Speech Detection
Lam Pham
Phat Lam
Dat Tran
Hieu Tang
Tin Nguyen
Alexander Schindler
Canh Vu
Alexander Polonsky
Canh Vu
127
5
0
23 Sep 2024
OmniBench: Towards The Future of Universal Omni-Language Models
OmniBench: Towards The Future of Universal Omni-Language Models
Yizhi Li
Ge Zhang
Yinghao Ma
Ruibin Yuan
Kang Zhu
...
Zhaoxiang Zhang
Zachary Liu
Emmanouil Benetos
Wenhao Huang
Chenghua Lin
LRM
162
19
0
23 Sep 2024
Exploring Text-Queried Sound Event Detection with Audio Source Separation
Exploring Text-Queried Sound Event Detection with Audio Source Separation
Han Yin
Jisheng Bai
Yang Xiao
Hui Wang
Siqi Zheng
Yafeng Chen
Rohan Kumar Das
Chong Deng
Jianfeng Chen
104
4
0
20 Sep 2024
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Leveraging Audio-Only Data for Text-Queried Target Sound Extraction
Kohei Saijo
Janek Ebbers
François Germain
Sameer Khurana
Gordon Wichern
Jonathan Le Roux
99
1
0
20 Sep 2024
A sound description: Exploring prompt templates and class descriptions
  to enhance zero-shot audio classification
A sound description: Exploring prompt templates and class descriptions to enhance zero-shot audio classification
Michel Olvera
Paraskevas Stamatiadis
S. Essid
VLM
72
1
0
19 Sep 2024
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
SoundBeam meets M2D: Target Sound Extraction with Audio Foundation Model
Carlos Hernandez-Olivan
Marc Delcroix
Tsubasa Ochiai
Daisuke Niizumi
Naohiro Tawara
Tomohiro Nakatani
Shoko Araki
54
2
0
19 Sep 2024
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
ViolinDiff: Enhancing Expressive Violin Synthesis with Pitch Bend Conditioning
Daewoong Kim
Hao-Wen Dong
Dasaem Jeong
54
0
0
19 Sep 2024
DETECLAP: Enhancing Audio-Visual Representation Learning with Object
  Information
DETECLAP: Enhancing Audio-Visual Representation Learning with Object Information
Shota Nakada
Taichi Nishimura
Hokuto Munakata
Masayoshi Kondo
Tatsuya Komatsu
CLIPVLM
63
0
0
18 Sep 2024
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient
  Music-Text Representation Learning
Augment, Drop & Swap: Improving Diversity in LLM Captions for Efficient Music-Text Representation Learning
Ilaria Manco
Justin Salamon
Oriol Nieto
55
2
0
17 Sep 2024
Evaluation of pretrained language models on music understanding
Evaluation of pretrained language models on music understanding
Yannis Vasilakis
Rachel M. Bittner
Johan Pauwels
94
1
0
17 Sep 2024
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic
  Music Generated via Text-to-Music Models
FakeMusicCaps: a Dataset for Detection and Attribution of Synthetic Music Generated via Text-to-Music Models
Luca Comanducci
Paolo Bestagini
Stefano Tubaro
62
7
0
16 Sep 2024
Efficient Video to Audio Mapper with Visual Scene Detection
Efficient Video to Audio Mapper with Visual Scene Detection
Mingjing Yi
Ming Li
VGen
96
3
0
15 Sep 2024
Prevailing Research Areas for Music AI in the Era of Foundation Models
Prevailing Research Areas for Music AI in the Era of Foundation Models
Megan Wei
M. Modrzejewski
Aswin Sivaraman
Dorien Herremans
MedIm
94
2
0
14 Sep 2024
Previous
12345678
Next