Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 2,220 papers shown
Title
Depth-PC: A Visual Servo Framework Integrated with Cross-Modality Fusion for Sim2Real Transfer
Haoyu Zhang
Weiyang Lin
Yimu Jiang
Chao Ye
80
0
0
26 Nov 2024
Symmetry Strikes Back: From Single-Image Symmetry Detection to 3D Generation
Xiang Li
Zixuan Huang
Anh Thai
James M. Rehg
3DGS
87
0
0
26 Nov 2024
CityWalker: Learning Embodied Urban Navigation from Web-Scale Videos
Xinhao Liu
Jiajian Li
Yichen Jiang
Niranjan Sujay
Zheng Yang
Juexiao Zhang
John Abanes
Jing Zhang
Chen Feng
116
2
0
26 Nov 2024
Distilling Spectral Graph for Object-Context Aware Open-Vocabulary Semantic Segmentation
Chanyoung Kim
Dayun Ju
Woojung Han
Ming-Hsuan Yang
Seong Jae Hwang
VLM
VOS
89
0
0
26 Nov 2024
Online Episodic Memory Visual Query Localization with Egocentric Streaming Object Memory
Zaira Manigrasso
Matteo Dunnhofer
Antonino Furnari
Moritz Nottebaum
Antonio Finocchiaro
Davide Marana
G. Farinella
C. Micheloni
88
1
0
25 Nov 2024
Open Vocabulary Monocular 3D Object Detection
Jin Yao
Hao Gu
Xuweiyi Chen
Jiayun Wang
Zezhou Cheng
ObjD
VLM
76
3
0
25 Nov 2024
Diffusion Features for Zero-Shot 6DoF Object Pose Estimation
Bernd Von Gimborn
P. Ausserlechner
Markus Vincze
S. Thalhammer
DiffM
76
0
0
25 Nov 2024
Edge Weight Prediction For Category-Agnostic Pose Estimation
Or Hirschorn
S. Avidan
98
0
0
25 Nov 2024
Efficient Video Face Enhancement with Enhanced Spatial-Temporal Consistency
Y. Wang
Jiajie Teng
Jiajiong Cao
Yuming Li
Chenguang Ma
Hongteng Xu
Dixin Luo
VGen
DiffM
86
0
0
25 Nov 2024
A Study on Unsupervised Domain Adaptation for Semantic Segmentation in the Era of Vision-Language Models
Manuel Schwonberg
Claus Werner
Hanno Gottschalk
Carsten Meyer
VLM
97
0
0
25 Nov 2024
Image Generation Diversity Issues and How to Tame Them
Mischa Dombrowski
Weitong Zhang
Sarah Cechnicka
Hadrien Reynaud
Bernhard Kainz
82
0
0
25 Nov 2024
Med-PerSAM: One-Shot Visual Prompt Tuning for Personalized Segment Anything Model in Medical Domain
Hangyul Yoon
Doohyuk Jang
JungEun Kim
Eunho Yang
VLM
MedIm
74
1
0
25 Nov 2024
VideoOrion: Tokenizing Object Dynamics in Videos
Yicheng Feng
Yijiang Li
Wanpeng Zhang
Sipeng Zheng
Zongqing Lu
Sipeng Zheng
Zongqing Lu
109
1
0
25 Nov 2024
SAR3D: Autoregressive 3D Object Generation and Understanding via Multi-scale 3D VQVAE
Yongwei Chen
Yushi Lan
Shangchen Zhou
Tengfei Wang
Xingang Pan
112
5
0
25 Nov 2024
UNOPose: Unseen Object Pose Estimation with an Unposed RGB-D Reference Image
Xingyu Liu
Gu Wang
Ruida Zhang
Chenyangguang Zhang
F. Tombari
Xiangyang Ji
287
2
0
25 Nov 2024
ResCLIP: Residual Attention for Training-free Dense Vision-language Inference
Yuhang Yang
Jinhong Deng
Wen Li
Lixin Duan
VLM
83
0
0
24 Nov 2024
Medical Slice Transformer: Improved Diagnosis and Explainability on 3D Medical Images with DINOv2
Gustav Muller-Franzes
Firas Khader
R. Siepmann
T. Han
Jakob Nikolas Kather
S. Nebelung
Daniel Truhn
MedIm
79
0
0
24 Nov 2024
PriorDiffusion: Leverage Language Prior in Diffusion Models for Monocular Depth Estimation
Ziyao Zeng
Jingcheng Ni
Daniel Wang
Patrick Rim
Younjoon Chung
Fengyu Yang
Byung-Woo Hong
A. Wong
DiffM
MDE
113
2
0
24 Nov 2024
Training an Open-Vocabulary Monocular 3D Object Detection Model without 3D Data
Rui Huang
Henry Zheng
Yan Wang
Zhuofan Xia
Marco Pavone
Gao Huang
3DPC
VLM
96
1
0
23 Nov 2024
Revelio
\textit{Revelio}
Revelio
: Interpreting and leveraging semantic information in diffusion models
Dahye Kim
Xavier Thomas
Deepti Ghadiyaram
91
4
0
23 Nov 2024
Twin Trigger Generative Networks for Backdoor Attacks against Object Detection
Zhiying Li
Zhi Liu
Guanggang Geng
Shreyank N. Gowda
Shuyuan Lin
Jian Weng
Xiaobo Jin
AAML
82
0
0
23 Nov 2024
Zero-Shot Coreset Selection: Efficient Pruning for Unlabeled Data
Brent A. Griffin
Jacob Marks
Jason J. Corso
VLM
84
2
0
22 Nov 2024
There is no SAMantics! Exploring SAM as a Backbone for Visual Understanding Tasks
Miguel Espinosa
Chenhongyi Yang
Linus Ericsson
Jingyu Sun
Elliot J. Crowley
VLM
80
0
0
22 Nov 2024
Design-o-meter: Towards Evaluating and Refining Graphic Designs
Sahil Goyal
Abhinav Mahajan
Swasti Mishra
Prateksha Udhayanan
Tripti Shukla
K. J. Joseph
Balaji Vasan Srinivasan
85
1
0
22 Nov 2024
RankByGene: Gene-Guided Histopathology Representation Learning Through Cross-Modal Ranking Consistency
Wentao Huang
Meilong Xu
Xiaoling Hu
Shahira Abousamra
Aniruddha Ganguly
...
Prateek Prasanna
Tahsin M. Kurc
Joel H. Saltz
Michael L. Miller
Chong Chen
86
0
0
22 Nov 2024
Segment Any Class (SAC): Multi-Class Few-Shot Semantic Segmentation via Class Region Proposals
Hussni Mohd Zakir
Eric Tatt Wei Ho
VLM
84
0
0
21 Nov 2024
NexusSplats: Efficient 3D Gaussian Splatting in the Wild
Yuzhou Tang
Dejun Xu
Yongjie Hou
Zhenzhong Wang
Min Jiang
3DGS
96
1
0
21 Nov 2024
HF-Diff: High-Frequency Perceptual Loss and Distribution Matching for One-Step Diffusion-Based Image Super-Resolution
S. Sami
Md Golam Moula Mehedi Hasan
J. Dawson
Nasser M. Nasrabadi
DiffM
80
0
0
20 Nov 2024
XMask3D: Cross-modal Mask Reasoning for Open Vocabulary 3D Semantic Segmentation
Ziyi Wang
Yufei Wang
Xumin Yu
Jie Zhou
Jiwen Lu
74
0
0
20 Nov 2024
RAW-Diffusion: RGB-Guided Diffusion Models for High-Fidelity RAW Image Generation
Christoph Reinders
Radu Berdan
Beril Besbinar
Junji Otsuka
Daisuke Iso
91
2
0
20 Nov 2024
Adapting Vision Foundation Models for Robust Cloud Segmentation in Remote Sensing Images
Xuechao Zou
Shun Zhang
Kai Li
Shiying Wang
Junliang Xing
Lei Jin
Congyan Lang
Pin Tao
66
1
0
20 Nov 2024
CV-Cities: Advancing Cross-View Geo-Localization in Global Cities
Gaoshuang Huang
Yang Zhou
Luying Zhao
Wenjian Gan
75
2
0
19 Nov 2024
Visual Cue Enhancement and Dual Low-Rank Adaptation for Efficient Visual Instruction Fine-Tuning
Pengkun Jiao
Bin Zhu
Jingjing Chen
Chong-Wah Ngo
Yu-Gang Jiang
VLM
OffRL
81
0
0
19 Nov 2024
KDC-MAE: Knowledge Distilled Contrastive Mask Auto-Encoder
Maheswar Bora
Saurabh Atreya
Aritra Mukherjee
Abhijit Das
92
0
0
19 Nov 2024
Relational Contrastive Learning and Masked Image Modeling for Scene Text Recognition
T. Lin
Jinglei Zhang
Yi Xu
Kai Chen
Rui Zhang
Chong Chen
40
0
0
18 Nov 2024
Towards a Comprehensive Benchmark for Pathological Lymph Node Metastasis in Breast Cancer Sections
Xitong Ling
Yuanyuan Lei
Jiawen Li
Junru Cheng
Wenting Huang
Tian Guan
Jian Guan
Yonghong He
26
4
0
16 Nov 2024
CorrCLIP: Reconstructing Correlations in CLIP with Off-the-Shelf Foundation Models for Open-Vocabulary Semantic Segmentation
Dengke Zhang
Fagui Liu
Quan Tang
VLM
62
1
0
15 Nov 2024
Free Lunch in Pathology Foundation Model: Task-specific Model Adaptation with Concept-Guided Feature Enhancement
Yanyan Huang
Weiqin Zhao
Yihang Chen
Yu Fu
Lequan Yu
MedIm
42
2
0
15 Nov 2024
Advancing Fine-Grained Visual Understanding with Multi-Scale Alignment in Multi-Modal Models
Wei Wang
Zechao Li
Qi Xu
Linfeng Li
Yiqing Cai
Botian Jiang
Hang Song
Xingcan Hu
Pengyu Wang
Li Xiao
36
2
0
14 Nov 2024
Assessing the Performance of the DINOv2 Self-supervised Learning Vision Transformer Model for the Segmentation of the Left Atrium from MRI Images
Bipasha Kundu
Bidur Khanal
R. Simon
Cristian A. Linte
MedIm
28
2
0
14 Nov 2024
MFTIQ: Multi-Flow Tracker with Independent Matching Quality Estimation
Jonas Serych
Michal Neoral
Jirí Matas
36
3
0
14 Nov 2024
Harnessing Vision Foundation Models for High-Performance, Training-Free Open Vocabulary Segmentation
Yuheng Shi
Minjing Dong
Chang Xu
VLM
48
1
0
14 Nov 2024
Inconsistencies In Consistency Models: Better ODE Solving Does Not Imply Better Samples
Noël Vouitsis
Rasa Hosseinzadeh
Brendan Leigh Ross
Valentin Villecroze
S. Gorti
Jesse C. Cresswell
Gabriel Loaiza-Ganem
DiffM
53
0
0
13 Nov 2024
Wavelet Latent Diffusion (Wala): Billion-Parameter 3D Generative Model with Compact Wavelet Encodings
Aditya Sanghi
Aliasghar Khani
Pradyumna Reddy
Arianna Rampini
Derek Cheung
Kamal Rahimi Malekshan
Kanika Madan
Hooman Shayani
57
3
0
12 Nov 2024
Automatic dataset shift identification to support root cause analysis of AI performance drift
Mélanie Roschewitz
Raghav Mehta
Charles Jones
Ben Glocker
OOD
43
2
0
12 Nov 2024
SAMPart3D: Segment Any Part in 3D Objects
Yanting Yang
Yukun Huang
Yu Guo
Liangjun Lu
Xiaoyang Wu
Edmund Y. Lam
Yan-Pei Cao
Xihui Liu
VLM
44
7
0
11 Nov 2024
CapeLLM: Support-Free Category-Agnostic Pose Estimation with Multimodal Large Language Models
Junho Kim
Hyungjin Chung
Byung-Hoon Kim
VLM
39
0
0
11 Nov 2024
Pattern Integration and Enhancement Vision Transformer for Self-Supervised Learning in Remote Sensing
Kaixuan Lu
Ruiqian Zhang
Xiao Huang
Yuxing Xie
Xiaogang Ning
Hanchao Zhang
Mengke Yuan
Pan Zhang
Tao Wang
Tongkui Liao
42
0
0
09 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
44
3
0
08 Nov 2024
CAD-MLLM: Unifying Multimodality-Conditioned CAD Generation With MLLM
Jingwei Xu
Chenyu Wang
Zibo Zhao
Wen Liu
Yi Ma
Shenghua Gao
58
13
0
07 Nov 2024
Previous
1
2
3
...
15
16
17
...
43
44
45
Next