Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2303.06591
Cited By
Accommodating Audio Modality in CLIP for Multimodal Processing
12 March 2023
Ludan Ruan
Anwen Hu
Yuqing Song
Liang Zhang
S. Zheng
Qin Jin
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Accommodating Audio Modality in CLIP for Multimodal Processing"
13 / 13 papers shown
Title
Synergy-CLIP: Extending CLIP with Multi-modal Integration for Robust Representation Learning
Sangyeon Cho
Jangyeong Jeon
Mingi Kim
Junyeong Kim
CLIP
VLM
76
0
0
30 Apr 2025
Gramian Multimodal Representation Learning and Alignment
Giordano Cicchetti
Eleonora Grassucci
Luigi Sigillo
Danilo Comminiello
91
0
0
16 Dec 2024
STEVE-Audio: Expanding the Goal Conditioning Modalities of Embodied Agents in Minecraft
Nicholas Lenzen
Amogh Raut
Andrew Melnik
VGen
69
0
0
01 Dec 2024
Contrasting with Symile: Simple Model-Agnostic Representation Learning for Unlimited Modalities
A. Saporta
A. Puli
Mark Goldstein
Rajesh Ranganath
SSL
36
0
0
01 Nov 2024
Refining Knowledge Transfer on Audio-Image Temporal Agreement for Audio-Text Cross Retrieval
Shunsuke Tsubaki
Daisuke Niizumi
Daiki Takeuchi
Yasunori Ohishi
Noboru Harada
Keisuke Imoto
19
1
0
16 Mar 2024
CoAVT: A Cognition-Inspired Unified Audio-Visual-Text Pre-Training Model for Multimodal Processing
Xianghu Yue
Xiaohai Tian
Lu Lu
Malu Zhang
Zhizheng Wu
Haizhou Li
34
0
0
22 Jan 2024
Audio Generation with Multiple Conditional Diffusion Model
Zhifang Guo
Jianguo Mao
Ruijie Tao
Long Yan
Kazushige Ouchi
Hong Liu
Xiangdong Wang
DiffM
21
11
0
23 Aug 2023
Exploring the Role of Audio in Video Captioning
Yuhan Shen
Linjie Yang
Longyin Wen
Haichao Yu
Ehsan Elhamifar
Heng Wang
18
2
0
21 Jun 2023
MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint Audio and Video Generation
Ludan Ruan
Y. Ma
Huan Yang
Huiguo He
Bei Liu
Jianlong Fu
Nicholas Jing Yuan
Qin Jin
B. Guo
DiffM
VGen
28
170
0
19 Dec 2022
VATT: Transformers for Multimodal Self-Supervised Learning from Raw Video, Audio and Text
Hassan Akbari
Liangzhe Yuan
Rui Qian
Wei-Hong Chuang
Shih-Fu Chang
Yin Cui
Boqing Gong
ViT
248
577
0
22 Apr 2021
CLIP4Clip: An Empirical Study of CLIP for End to End Video Clip Retrieval
Huaishao Luo
Lei Ji
Ming Zhong
Yang Chen
Wen Lei
Nan Duan
Tianrui Li
CLIP
VLM
317
780
0
18 Apr 2021
A Straightforward Framework For Video Retrieval Using CLIP
Jesús Andrés Portillo-Quintero
J. C. Ortíz-Bayliss
Hugo Terashima-Marín
CLIP
318
116
0
24 Feb 2021
Multi-modal Transformer for Video Retrieval
Valentin Gabeur
Chen Sun
Alahari Karteek
Cordelia Schmid
ViT
415
595
0
21 Jul 2020
1