Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
v1
v2 (latest)
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 826 papers shown
Title
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
148
0
0
23 Feb 2025
Understanding the Emergence of Multimodal Representation Alignment
Megan Tjandrasuwita
Chanakya Ekbote
Liu Ziyin
Paul Pu Liang
108
2
0
22 Feb 2025
DynamicGSG: Dynamic 3D Gaussian Scene Graphs for Environment Adaptation
Luzhou Ge
Xiangyu Zhu
Zhuo Yang
Xuesong Li
3DGS
124
0
0
21 Feb 2025
Robin3D: Improving 3D Large Language Model via Robust Instruction Tuning
Weitai Kang
Haifeng Huang
Yuzhang Shang
Mubarak Shah
Yan Yan
102
9
0
21 Feb 2025
Data Attribution for Text-to-Image Models by Unlearning Synthesized Images
Sheng-Yu Wang
Aaron Hertzmann
Alexei A. Efros
Jun-Yan Zhu
Richard Zhang
TDI
209
3
0
21 Feb 2025
Contrastive Localized Language-Image Pre-Training
Hong-You Chen
Zhengfeng Lai
Hao Zhang
Xiang Wang
Marcin Eichner
Keen You
Meng Cao
Bowen Zhang
Yue Yang
Zhe Gan
CLIP
VLM
124
10
0
20 Feb 2025
Feedforward Few-shot Species Range Estimation
Christian Lange
Max Hamilton
Elijah Cole
Alexander Shepard
Samuel Heinrich
Angela Zhu
Subhransu Maji
Grant Van Horn
Oisin Mac Aodha
148
0
0
20 Feb 2025
UNION: Unsupervised 3D Object Detection using Object Appearance-based Pseudo-Classes
T. Lentsch
Holger Caesar
D. Gavrila
3DPC
156
8
0
20 Feb 2025
Continually Learning Structured Visual Representations via Network Refinement with Rerelation
Zeki Doruk Erden
Boi Faltings
CLL
127
2
0
20 Feb 2025
Personalized Instance-based Navigation Toward User-Specific Objects in Realistic Environments
Luca Barsellotti
Roberto Bigazzi
Marcella Cornia
Lorenzo Baraldi
Rita Cucchiara
231
1
0
20 Feb 2025
Archetypal SAE: Adaptive and Stable Dictionary Learning for Concept Extraction in Large Vision Models
Thomas Fel
Ekdeep Singh Lubana
Jacob S. Prince
M. Kowal
Victor Boutin
Isabel Papadimitriou
Binxu Wang
Martin Wattenberg
Demba Ba
Talia Konkle
76
8
0
18 Feb 2025
CAST: Component-Aligned 3D Scene Reconstruction from an RGB Image
Kaixin Yao
Longwen Zhang
Xinhao Yan
Yan Zeng
Qixuan Zhang
Wei Yang
Lan Xu
Jiayuan Gu
Jingyi Yu
142
8
0
18 Feb 2025
L4P: Low-Level 4D Vision Perception Unified
Abhishek Badki
Hang Su
Bowen Wen
Orazio Gallo
VLM
175
1
0
18 Feb 2025
Object-Centric Image to Video Generation with Language Guidance
Angel Villar-Corrales
Gjergj Plepi
Sven Behnke
DiffM
VGen
OCL
252
1
0
17 Feb 2025
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu
Difan Zou
Dong Xu
157
1
0
17 Feb 2025
Dynamic Scene Understanding through Object-Centric Voxelization and Neural Rendering
Yanpeng Zhao
Yiwei Hao
Siyu Gao
Yunbo Wang
Xiaokang Yang
OCL
266
1
0
17 Feb 2025
Adversarially Robust CLIP Models Can Induce Better (Robust) Perceptual Metrics
Francesco Croce
Christian Schlarmann
Naman D. Singh
Matthias Hein
158
7
0
17 Feb 2025
SAM-LAD: Segment Anything Model Meets Zero-Shot Logic Anomaly Detection
Yun Peng
Xiao Lin
Nachuan Ma
Jiayuan Du
Chuangwei Liu
Chengju Liu
Qi Chen
182
3
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
265
0
0
17 Feb 2025
GeoDANO: Geometric VLM with Domain Agnostic Vision Encoder
Seunghyuk Cho
Zhenyue Qin
Yang Liu
Youngbin Choi
Seungbeom Lee
Dongwoo Kim
83
0
0
17 Feb 2025
Masked Latent Prediction and Classification for Self-Supervised Audio Representation Learning
Aurian Quélennec
Pierre Chouteau
Geoffroy Peeters
S. Essid
SSL
153
0
0
17 Feb 2025
Simplifying DINO via Coding Rate Regularization
Ziyang Wu
Jingyuan Zhang
Druv Pai
Xinze Wang
Chandan Singh
Jianwei Yang
Jianfeng Gao
Yi-An Ma
548
1
0
17 Feb 2025
Why Vision Language Models Struggle with Visual Arithmetic? Towards Enhanced Chart and Geometry Understanding
Kung-Hsiang Huang
Can Qin
Haoyi Qiu
Philippe Laban
Shafiq Joty
Caiming Xiong
Chien-Sheng Wu
VLM
353
5
0
17 Feb 2025
Phantom: Subject-consistent video generation via cross-modal alignment
Lijie Liu
Tianxiang Ma
Bingchuan Li
Zhuowei Chen
Jiawei Liu
Qian He
Xinglong Wu
Qian He
Xinglong Wu
DiffM
VGen
180
14
0
16 Feb 2025
Adaptive Neural Networks for Intelligent Data-Driven Development
Youssef Shoeb
Azarm Nowzad
Hanno Gottschalk
246
2
0
14 Feb 2025
Ask in Any Modality: A Comprehensive Survey on Multimodal Retrieval-Augmented Generation
Mohammad Mahdi Abootorabi
Amirhosein Zobeiri
Mahdi Dehghani
Mohammadali Mohammadkhani
Bardia Mohammadi
Omid Ghahroodi
M. Baghshah
Ehsaneddin Asgari
RALM
351
7
0
12 Feb 2025
Learning Human Skill Generators at Key-Step Levels
Yilu Wu
Chenhui Zhu
Shuai Wang
Hanlin Wang
Jing Wang
Zhaoxiang Zhang
Limin Wang
VGen
212
0
0
12 Feb 2025
Matrix3D: Large Photogrammetry Model All-in-One
Yuanxun Lu
Jingyang Zhang
Tian Fang
Jean-Daniel Nahmias
Yanghai Tsin
Long Quan
Xun Cao
Yao Yao
Shiwei Li
207
6
0
11 Feb 2025
SIREN: Semantic, Initialization-Free Registration of Multi-Robot Gaussian Splatting Maps
Ola Shorinwa
Jiankai Sun
Mac Schwager
Anirudha Majumdar
3DGS
143
4
0
10 Feb 2025
Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing
Sicen Guo
Tianyou Wen
Chuang-Wei Liu
Qijun Chen
Rui Fan
105
0
0
10 Feb 2025
Stochastic Forward-Backward Deconvolution: Training Diffusion Models with Finite Noisy Datasets
Haoye Lu
Qifan Wu
Yaoliang Yu
DiffM
116
2
0
08 Feb 2025
No Free Lunch in Annotation either: An objective evaluation of foundation models for streamlining annotation in animal tracking
Emil Mededovic
Valdy Laurentius
Yuli Wu
Marcin Kopaczka
Zhu Chen
Mareike Schulz
René Tolba
Johannes Stegmaier
167
1
0
06 Feb 2025
ZISVFM: Zero-Shot Object Instance Segmentation in Indoor Robotic Environments with Vision Foundation Models
Ying Zhang
Maoliang Yin
Wenfu Bi
Haibao Yan
Shaohan Bian
Cui-Hua Zhang
C. Hua
127
2
0
05 Feb 2025
Efficient Domain Adaptation of Multimodal Embeddings using Constrastive Learning
Georgios Margaritis
Periklis Petridis
Dimitris Bertsimas
142
0
0
04 Feb 2025
UNIP: Rethinking Pre-trained Attention Patterns for Infrared Semantic Segmentation
Tao Zhang
Jinyong Wen
Zhen Chen
Kun Ding
Di Zhang
Chunhong Pan
259
1
0
04 Feb 2025
Exploring Few-Shot Defect Segmentation in General Industrial Scenarios with Metric Learning and Vision Foundation Models
Tongkun Liu
Bing Li
Xiao Jin
Yupeng Shi
Qiuying Li
Xiang Wei
133
0
0
03 Feb 2025
AquaticCLIP: A Vision-Language Foundation Model for Underwater Scene Analysis
B. Alawode
I. I. Ganapathi
S. Javed
Naoufel Werghi
Mohammed Bennamoun
Arif Mahmood
CLIP
VLM
110
1
0
03 Feb 2025
Label Correction for Road Segmentation Using Road-side Cameras
Henrik Toikka
Eerik Alamikkotervo
Risto Ojala
109
0
0
03 Feb 2025
Lifting by Gaussians: A Simple, Fast and Flexible Method for 3D Instance Segmentation
Rohan Chacko
Nicolai Haeni
Eldar Khaliullin
Lin Sun
Douglas Lee
3DGS
149
1
0
31 Jan 2025
MADation: Face Morphing Attack Detection with Foundation Models
Eduarda Caldeira
Guray Ozgur
Tahar Chettaoui
Marija Ivanovska
Peter Peer
Fadi Boutros
Vitomir Štruc
Naser Damer
CVBM
90
2
1
28 Jan 2025
Controllable Forgetting Mechanism for Few-Shot Class-Incremental Learning
Kirill Paramonov
Mete Ozay
Eunju Yang
J. Moon
Umberto Michieli
109
0
0
28 Jan 2025
MATCHA:Towards Matching Anything
Fei Xue
Sven Elflein
Laura Leal-Taixe
Qunjie Zhou
145
1
0
28 Jan 2025
Slot-Guided Adaptation of Pre-trained Diffusion Models for Object-Centric Learning and Compositional Generation
Adil Kaan Akan
Yucel Yemez
DiffM
OCL
86
0
0
27 Jan 2025
BiFold: Bimanual Cloth Folding with Language Guidance
Oriol Barbany
Adrià Colomé
Carme Torras
44
1
0
27 Jan 2025
Rethinking Encoder-Decoder Flow Through Shared Structures
Frederik Laboyrie
M. K. Yucel
Albert Saà-Garriga
AI4CE
82
0
0
24 Jan 2025
Towards Scalable Topological Regularizers
Hiu-Tung Wong
Darrick Lee
Hong Yan
BDL
137
0
0
24 Jan 2025
CuriousBot: Interactive Mobile Exploration via Actionable 3D Relational Object Graph
Yixuan Wang
Leonor Fermoselle
Tarik Kelestemur
Jiuguang Wang
Yunzhu Li
84
1
0
23 Jan 2025
DynamicEarth: How Far are We from Open-Vocabulary Change Detection?
Kaiyu Li
Xiangyong Cao
Yupeng Deng
Chao Pang
Zepeng Xin
Deyu Meng
Zhi Wang
ObjD
151
1
0
22 Jan 2025
Enhancing Monocular Depth Estimation with Multi-Source Auxiliary Tasks
Alessio Quercia
Erenus Yildiz
Zhuo Cao
Kai Krajsek
Abigail Morrison
Ira Assent
Hanno Scharr
112
0
0
22 Jan 2025
Video Depth Anything: Consistent Depth Estimation for Super-Long Videos
Sili Chen
Hengkai Guo
Shengnan Zhu
Feihu Zhang
Zilong Huang
Jiashi Feng
Bingyi Kang
MDE
VLM
AuLLM
171
20
0
21 Jan 2025
Previous
1
2
3
...
8
9
10
...
15
16
17
Next