Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2304.07193
Cited By
v1
v2 (latest)
DINOv2: Learning Robust Visual Features without Supervision
14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"DINOv2: Learning Robust Visual Features without Supervision"
50 / 826 papers shown
Title
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Stefan Sylvius Wagner
Stefan Harmeling
OCL
137
1
0
12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
Chong Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
249
0
0
12 Mar 2025
Exploring Position Encoding in Diffusion U-Net for Training-free High-resolution Image Generation
Feng Zhou
Pu Cao
Yiyang Ma
Lu Yang
Jianqin Yin
DiffM
107
0
0
12 Mar 2025
Efficient Alignment of Unconditioned Action Prior for Language-conditioned Pick and Place in Clutter
Kechun Xu
Xunlong Xia
Kaixuan Wang
Yifei Yang
Yunxuan Mao
Bing Deng
R. Xiong
Yansen Wang
OffRL
193
0
0
12 Mar 2025
Quality Over Quantity? LLM-Based Curation for a Data-Efficient Audio-Video Foundation Model
Ali Vosoughi
Dimitra Emmanouilidou
H. Gamper
131
1
0
12 Mar 2025
Seal Your Backdoor with Variational Defense
Ivan Sabolić
Matej Grcić
Sinisa Segvic
AAML
460
0
0
11 Mar 2025
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo
Jie Hu
Yansong Qu
Liujuan Cao
3DGS
477
0
0
11 Mar 2025
Endo-FASt3r: Endoscopic Foundation model Adaptation for Structure from motion
Mona Sheikh Zeinoddin
Mobarakol Islam
Zafer Tandogdu
Greg Shaw
Mathew J. Clarkson
E. Mazomenos
Danail Stoyanov
472
0
0
10 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
222
0
0
10 Mar 2025
What's in a Latent? Leveraging Diffusion Latent Space for Domain Generalization
Xavier Thomas
Deepti Ghadiyaram
DiffM
196
0
0
09 Mar 2025
TextInPlace: Indoor Visual Place Recognition in Repetitive Structures with Scene Text Spotting and Verification
Huaqi Tao
Bingxi Liu
Calvin Chen
Tingjun Huang
He Li
Jinqiang Cui
Hong Zhang
121
0
0
09 Mar 2025
OpenRSD: Towards Open-prompts for Object Detection in Remote Sensing Images
Ziyue Huang
Yongchao Feng
Shuai Yang
Ziqiang Liu
Qingjie Liu
Yansen Wang
ObjD
449
1
0
08 Mar 2025
ForestSplats: Deformable transient field for Gaussian Splatting in the Wild
Wongi Park
Myeongseok Nam
Siwon Kim
Sangwoo Jo
Soomok Lee
3DGS
132
0
0
08 Mar 2025
Stereo Any Video: Temporally Consistent Stereo Matching
Junpeng Jing
Weixun Luo
Ye Mao
K. Mikolajczyk
98
0
0
07 Mar 2025
Novel Object 6D Pose Estimation with a Single Reference View
Jian Liu
Wei Sun
Kai Zeng
Jin Zheng
Hui Yang
Lin Wang
Hossein Rahmani
Ajmal Mian
117
3
0
07 Mar 2025
EDM: Efficient Deep Feature Matching
Xi Li
Tong Rao
Cihui Pan
96
0
0
07 Mar 2025
Persistent Object Gaussian Splat (POGS) for Tracking Human and Robot Manipulation of Irregularly Shaped Objects
Justin Yu
Kush Hari
Karim El-Refai
Arnav Dalal
Justin Kerr
Chung Min Kim
Richard Cheng
Muhammad Zubair Irshad
Ken Goldberg
91
3
0
07 Mar 2025
VQEL: Enabling Self-Developed Symbolic Language in Agents through Vector Quantization in Emergent Language Games
Mohammad Mahdi Samiei Paqaleh
Mahdieh Soleymani Baghshah
98
0
0
06 Mar 2025
Semantic Alignment of Unimodal Medical Text and Vision Representations
Maxime Di Folco
E. Chan
Marta Hasny
Cosmin I. Bercea
Julia A. Schnabel
98
0
0
06 Mar 2025
Generative Artificial Intelligence in Robotic Manipulation: A Survey
Kun Zhang
Peng Yun
Jun Cen
Junhao Cai
DiDi Zhu
...
Qifeng Chen
Jia Pan
Wei Zhang
Bo Yang
Hua Chen
177
1
0
05 Mar 2025
AirExo-2: Scaling up Generalizable Robotic Imitation Learning with Low-Cost Exoskeletons
Hongjie Fang
Chenxi Wang
Yiming Wang
J. Chen
Shangning Xia
...
Xinyu Zhan
Lixin Yang
Weiming Wang
Cewu Lu
Hao-Shu Fang
185
2
0
05 Mar 2025
COARSE: Collaborative Pseudo-Labeling with Coarse Real Labels for Off-Road Semantic Segmentation
Aurelio Noca
Xianmei Lei
Jonathan Becktor
J. Edlund
Anna Sabel
Patrick Spieler
Curtis Padgett
Alexandre Alahi
Deegan Atha
151
0
0
05 Mar 2025
CREStE: Scalable Mapless Navigation with Internet Scale Priors and Counterfactual Guidance
Arthur Zhang
Harshit S. Sikchi
Amy Zhang
Joydeep Biswas
121
1
0
05 Mar 2025
Is Pre-training Applicable to the Decoder for Dense Prediction?
Chao Ning
Wanshui Gan
Weihao Xuan
Naoto Yokoya
275
0
0
05 Mar 2025
A dataset-free approach for self-supervised learning of 3D reflectional symmetries
Issac Aguirre
Ivan Sipiran
Gabriel Montañana
78
1
0
04 Mar 2025
Out-of-Distribution Segmentation in Autonomous Driving: Problems and State of the Art
Youssef Shoeb
Azarm Nowzad
Hanno Gottschalk
UQCV
264
2
0
04 Mar 2025
Personalized Generation In Large Model Era: A Survey
Yiyan Xu
Jinghao Zhang
Alireza Salemi
Xinting Hu
Wenjie Wang
Fuli Feng
Hamed Zamani
Xiangnan He
Tat-Seng Chua
3DV
188
8
0
04 Mar 2025
One-shot In-context Part Segmentation
Zhenqi Dai
Ting Liu
Xinyu Zhang
Y. X. Wei
Yanning Zhang
VLM
176
1
0
03 Mar 2025
Enhancing Retinal Vessel Segmentation Generalization via Layout-Aware Generative Modelling
Jonathan Fhima
Jan Van Eijgen
Lennert Beeckmans
Thomas Jacobs
Moti Freiman
Luis Filipe Nakayama
Ingeborg Stalmans
Chaim Baskin
Joachim A. Behar
MedIm
178
0
0
03 Mar 2025
Solving Instance Detection from an Open-World Perspective
Qianqian Shen
Yunhan Zhao
Nahyun Kwon
Jeeeun Kim
Yanan Li
Shu Kong
138
1
0
01 Mar 2025
Learning to Animate Images from A Few Videos to Portray Delicate Human Actions
Haoxin Li
Yingchen Yu
Qilong Wu
Hanwang Zhang
Boyang Li
Song Bai
3DH
VGen
499
0
0
01 Mar 2025
MIRROR: Multi-Modal Pathological Self-Supervised Representation Learning via Modality Alignment and Retention
Tianyi Wang
Jianan Fan
Dingxin Zhang
Dongnan Liu
Yong-quan Xia
Heng Huang
Weidong Cai
157
1
0
01 Mar 2025
New Dataset and Methods for Fine-Grained Compositional Referring Expression Comprehension via Specialist-MLLM Collaboration
X. J. Yang
Jing Liu
Peng Wang
Guoqing Wang
Yue Yang
Jikang Cheng
ObjD
196
0
0
27 Feb 2025
Vector-Quantized Vision Foundation Models for Object-Centric Learning
Rongzhen Zhao
V. Wang
Arno Solin
Joni Pajarinen
OCL
VLM
561
1
0
27 Feb 2025
Beyond Next-Token: Next-X Prediction for Autoregressive Visual Generation
Sucheng Ren
Qihang Yu
Ju He
Xiaohui Shen
Alan Yuille
Liang-Chieh Chen
VGen
220
11
0
27 Feb 2025
LAM: Large Avatar Model for One-shot Animatable Gaussian Head
Yisheng He
Xiaodong Gu
Xiaodan Ye
Chao Xu
Zhengyi Zhao
Yuan Dong
Weihao Yuan
Zilong Dong
Liefeng Bo
3DGS
165
0
0
25 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
235
49
0
24 Feb 2025
Introducing Visual Perception Token into Multimodal Large Language Model
Runpeng Yu
Xinyin Ma
Xinchao Wang
MLLM
LRM
165
4
0
24 Feb 2025
SwimVG: Step-wise Multimodal Fusion and Adaption for Visual Grounding
Liangtao Shi
Ting Liu
Xiantao Hu
Yue Hu
Quanjun Yin
Richang Hong
ObjD
116
0
0
24 Feb 2025
MIM-Refiner: A Contrastive Learning Boost from Intermediate Pre-Trained Representations
Benedikt Alkin
Lukas Miklautz
Sepp Hochreiter
Johannes Brandstetter
VLM
257
8
0
24 Feb 2025
VaViM and VaVAM: Autonomous Driving through Video Generative Modeling
Florent Bartoccioni
Elias Ramzi
Victor Besnier
Shashanka Venkataramanan
Tuan-Hung Vu
...
Mickael Chen
Éloi Zablocki
Andrei Bursuc
Eduardo Valle
Matthieu Cord
VGen
176
2
0
24 Feb 2025
FUNCTO: Function-Centric One-Shot Imitation Learning for Tool Manipulation
Chao Tang
Anxing Xiao
Yuhong Deng
Tianrun Hu
Wenlong Dong
Hanbo Zhang
David Hsu
Hong Zhang
173
3
0
24 Feb 2025
Beyond Diagnostic Performance: Revealing and Quantifying Ethical Risks in Pathology Foundation Models
Weiping Lin
Shen Liu
Runchen Zhu
Yixuan Lin
Baoshun Wang
Liansheng Wang
61
1
0
24 Feb 2025
Disentangling Visual Transformers: Patch-level Interpretability for Image Classification
Guillaume Jeanneret
Loïc Simon
F. Jurie
ViT
158
0
0
24 Feb 2025
Enhancing Image Matting in Real-World Scenes with Mask-Guided Iterative Refinement
Rui Liu
72
0
0
24 Feb 2025
A Pragmatic Note on Evaluating Generative Models with Fréchet Inception Distance for Retinal Image Synthesis
Yuli Wu
Fucheng Liu
Rüveyda Yilmaz
Henning Konermann
Peter Walter
Johannes Stegmaier
EGVM
MedIm
132
2
0
24 Feb 2025
Fair Foundation Models for Medical Image Analysis: Challenges and Perspectives
Dilermando Queiroz
Anderson Carlos
André Anjos
Lilian Berton
119
0
0
24 Feb 2025
DemoGen: Synthetic Demonstration Generation for Data-Efficient Visuomotor Policy Learning
Zhengrong Xue
Shuying Deng
Zhenyang Chen
Yixuan Wang
Zhecheng Yuan
Huazhe Xu
114
9
0
24 Feb 2025
SelaVPR++: Towards Seamless Adaptation of Foundation Models for Efficient Place Recognition
Feng Lu
Tong Jin
X. Lan
Lijun Zhang
Yunpeng Liu
Yaowei Wang
Chun Yuan
82
1
0
23 Feb 2025
Human2Robot: Learning Robot Actions from Paired Human-Robot Videos
Sicheng Xie
Haidong Cao
Zejia Weng
Zhen Xing
Shiwei Shen
Jiaqi Leng
Xipeng Qiu
Yanwei Fu
Zuxuan Wu
Yu Jiang
148
0
0
23 Feb 2025
Previous
1
2
3
...
7
8
9
...
15
16
17
Next