Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.01448
Cited By
OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models
30 April 2025
Shengkai Chen
Yifang Yin
Jinming Cao
Shili Xiang
Zhenguang Liu
Roger Zimmermann
VOS
VLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"OpenAVS: Training-Free Open-Vocabulary Audio Visual Segmentation with Foundational Models"
13 / 13 papers shown
Title
Dual Mean-Teacher: An Unbiased Semi-Supervised Framework for Audio-Visual Source Localization
Yuxin Guo
Shijie Ma
Hu Su
Zhiqing Wang
Yuhao Zhao
Wei Zou
Siyang Sun
Yun Zheng
SSL
76
12
0
05 Mar 2024
Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling
Shentong Mo
Pedro Morgado
64
14
0
02 Dec 2023
Grounding DINO: Marrying DINO with Grounded Pre-Training for Open-Set Object Detection
Shilong Liu
Zhaoyang Zeng
Tianhe Ren
Feng Li
Hao Zhang
...
Chun-yue Li
Jianwei Yang
Hang Su
Jun Zhu
Lei Zhang
ObjD
191
2,015
0
09 Mar 2023
Extract Free Dense Labels from CLIP
Chong Zhou
Chen Change Loy
Bo Dai
VLM
CLIP
155
481
0
02 Dec 2021
AudioCLIP: Extending CLIP to Image, Text and Audio
A. Guzhov
Federico Raue
Jörn Hees
Andreas Dengel
CLIP
VLM
122
370
0
24 Jun 2021
AdaVQA: Overcoming Language Priors with Adapted Margin Cosine Loss
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Feng Ji
Ji Zhang
A. Bimbo
55
35
0
05 May 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
708
6,127
0
29 Apr 2021
Generative Transformer for Accurate and Reliable Salient Object Detection
Yuxin Mao
Jing Zhang
Zhexiong Wan
Yuchao Dai
Aixuan Li
Yun-Qiu Lv
Xinyu Tian
Deng-Ping Fan
Nick Barnes
ViT
98
33
0
20 Apr 2021
Localizing Visual Sounds the Hard Way
Honglie Chen
Weidi Xie
Triantafyllos Afouras
Arsha Nagrani
Andrea Vedaldi
Andrew Zisserman
ObjD
85
190
0
06 Apr 2021
AST: Audio Spectrogram Transformer
Yuan Gong
Yu-An Chung
James R. Glass
ViT
134
882
0
05 Apr 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
967
29,810
0
26 Feb 2021
SSTVOS: Sparse Spatiotemporal Transformers for Video Object Segmentation
Brendan Duke
Abdalla Ahmed
Christian Wolf
P. Aarabi
Graham W. Taylor
VOS
61
166
0
21 Jan 2021
Multiple Sound Sources Localization from Coarse to Fine
Rui Qian
Di Hu
Heinrich Dinkel
Mengyue Wu
N. Xu
Weiyao Lin
64
157
0
13 Jul 2020
1