Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2412.16334
Cited By
DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment
20 December 2024
Cijo Jose
Théo Moutakanni
Dahyun Kang
Federico Baldassarre
Timothée Darcet
Hu Xu
Daniel Li
Marc Szafraniec
Michael Ramamonjisoa
Maxime Oquab
Oriane Siméoni
Huy V. Vo
Patrick Labatut
Piotr Bojanowski
CLIP
VLM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2 Meets Text: A Unified Framework for Image- and Pixel-Level Vision-Language Alignment"
4 / 4 papers shown
Title
RadZero: Similarity-Based Cross-Attention for Explainable Vision-Language Alignment in Radiology with Zero-Shot Multi-Task Capability
Jonggwon Park
Soobum Kim
Byungmu Yoon
Kyoyun Choi
MedIm
38
0
0
10 Apr 2025
Memory-Modular Classification: Learning to Generalize with Memory Replacement
Dahyun Kang
Ahmet Iscen
Eunchan Jo
Sua Choi
Minsu Cho
Cordelia Schmid
VLM
KELM
OffRL
45
0
0
08 Apr 2025
Scaling Language-Free Visual Representation Learning
David Fan
Shengbang Tong
Jiachen Zhu
Koustuv Sinha
Zhuang Liu
...
Michael G. Rabbat
Nicolas Ballas
Yann LeCun
Amir Bar
Saining Xie
CLIP
VLM
69
2
0
01 Apr 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCL
VLM
92
2
0
27 Mar 2025
1