ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision
v1v2 (latest)

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLMCLIPSSL
ArXiv (abs)PDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 826 papers shown
Title
Diffusion Meets Few-shot Class Incremental Learning
Diffusion Meets Few-shot Class Incremental Learning
Junsu Kim
Yunhoe Ku
Dongyoon Han
Seungryul Baek
DiffMCLL
203
0
0
30 Mar 2025
OncoReg: Medical Image Registration for Oncological Challenges
OncoReg: Medical Image Registration for Oncological Challenges
Wiebke Heyer
Yannic Elser
Lennart Berkel
Xinrui Song
Xuanang Xu
...
Christoph Großbröhmer
Lasse Hansen
Alessa Hering
Malte M. Sieren
Mattias P. Heinrich
82
0
0
29 Mar 2025
Shape and Texture Recognition in Large Vision-Language Models
Shape and Texture Recognition in Large Vision-Language Models
Sagi Eppel
Mor Bismut
Alona Faktor
3DVVLM
97
2
0
29 Mar 2025
MVSAnywhere: Zero-Shot Multi-View Stereo
MVSAnywhere: Zero-Shot Multi-View Stereo
Sergio Izquierdo
Mohamed Sayed
Michael Firman
Guillermo Garcia-Hernando
Daniyar Turmukhambetov
Javier Civera
Oisin Mac Aodha
Gabriel J. Brostow
Jamie Watson
3DV
125
4
0
28 Mar 2025
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Masked Self-Supervised Pre-Training for Text Recognition Transformers on Large-Scale Datasets
Martin Kiss
Michal Hradiš
76
0
0
28 Mar 2025
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
CTRL-O: Language-Controllable Object-Centric Visual Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Rabiul Awal
Maximilian Seitzer
E. Gavves
Aishwarya Agrawal
OCLVLM
178
3
0
27 Mar 2025
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI
A Large-Scale Vision-Language Dataset Derived from Open Scientific Literature to Advance Biomedical Generalist AI
Alejandro Lozano
Min Woo Sun
James Burgess
Jeffrey Nirschl
Christopher Polzak
...
Xiaohan Wang
Alfred Seunghoon Song
Chiang Chia-Chun
Robert Tibshirani
Serena Yeung-Levy
LM&MA
182
2
0
26 Mar 2025
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Unconditional Priors Matter! Improving Conditional Generation of Fine-Tuned Diffusion Models
Prin Phunyaphibarn
Phillip Y. Lee
Jaihoon Kim
Minhyuk Sung
DiffM
184
1
0
26 Mar 2025
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
MAR-3D: Progressive Masked Auto-regressor for High-Resolution 3D Generation
Jinnan Chen
Lingting Zhu
Zeyu Hu
Shengju Qian
Yuxiao Chen
Xin Wang
G. Lee
199
2
0
26 Mar 2025
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
MoLe-VLA: Dynamic Layer-skipping Vision Language Action Model via Mixture-of-Layers for Efficient Robot Manipulation
Rongyu Zhang
Menghang Dong
Yuan Zhang
Liang Heng
Xiaowei Chi
Gaole Dai
Li Du
Dan Wang
Yuan Du
MoE
158
4
0
26 Mar 2025
DINeMo: Learning Neural Mesh Models with no 3D Annotations
DINeMo: Learning Neural Mesh Models with no 3D Annotations
Weijie Guo
Guofeng Zhang
Wufei Ma
Jieneng Chen
3DH
130
0
0
26 Mar 2025
Latent Beam Diffusion Models for Decoding Image Sequences
Latent Beam Diffusion Models for Decoding Image Sequences
Guilherme Fernandes
Vasco Ramos
Regev Cohen
Idan Szpektor
João Magalhães
166
1
0
26 Mar 2025
Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings
Surg-3M: A Dataset and Foundation Model for Perception in Surgical Settings
Chengan Che
Chao Wang
Tom Vercauteren
Sophia Tsoka
Luis C. Garcia-Peraza-Herrera
MedIm
84
1
0
25 Mar 2025
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
ChA-MAEViT: Unifying Channel-Aware Masked Autoencoders and Multi-Channel Vision Transformers for Improved Cross-Channel Learning
Chau Pham
Juan C. Caicedo
Bryan A. Plummer
78
0
0
25 Mar 2025
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
LPOSS: Label Propagation Over Patches and Pixels for Open-vocabulary Semantic Segmentation
Vladan Stojnić
Yannis Kalantidis
Jirí Matas
Giorgos Tolias
VLM
120
0
0
25 Mar 2025
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
FireEdit: Fine-grained Instruction-based Image Editing via Region-aware Vision Language Model
Zhiqiang Zhang
Jia-Nan Li
Zunnan Xu
Hanhui Li
Yiji Cheng
Fa-Ting Hong
Qin Lin
Qinglin Lu
Xiaodan Liang
DiffM
140
2
0
25 Mar 2025
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Context-Enhanced Memory-Refined Transformer for Online Action Detection
Zhanzhong Pang
Fadime Sener
Angela Yao
OffRL
125
2
0
24 Mar 2025
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
CoMP: Continual Multimodal Pre-training for Vision Foundation Models
Yuxiao Chen
L. Meng
Wujian Peng
Zuxuan Wu
Yu-Gang Jiang
VLM
211
1
0
24 Mar 2025
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
FisherTune: Fisher-Guided Robust Tuning of Vision Foundation Models for Domain Generalized Segmentation
Dong Zhao
Jinlong Li
Shuang Wang
Mengyao Wu
Qi Zang
N. Sebe
Zhun Zhong
461
1
0
23 Mar 2025
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Expanding the Boundaries of Vision Prior Knowledge in Multi-modal Large Language Models
Qiao Liang
Yanjiang Liu
Xianpei Han
Yaojie Lu
Hongyu Lin
Jia Zheng
Jia Zheng
Le Sun
Le Sun
Yingfei Sun
97
0
0
23 Mar 2025
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
SceneSplat: Gaussian Splatting-based Scene Understanding with Vision-Language Pretraining
Yue Li
Qi Ma
Runyi Yang
Huapeng Li
Mengjiao Ma
...
E. Konukoglu
Theo Gevers
Luc Van Gool
Martin R. Oswald
Danda Pani Paudel
3DGSVLM
235
2
0
23 Mar 2025
BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors
BackMix: Regularizing Open Set Recognition by Removing Underlying Fore-Background Priors
Yu Wang
Junxian Mu
Hongzhi Huang
Qilong Wang
Pengfei Zhu
Q. Hu
238
1
0
22 Mar 2025
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Cross-Modal and Uncertainty-Aware Agglomeration for Open-Vocabulary 3D Scene Understanding
Jinlong Li
Cristiano Saltori
Fabio Poiesi
N. Sebe
494
2
0
20 Mar 2025
Disentangled and Interpretable Multimodal Attention Fusion for Cancer Survival Prediction
Disentangled and Interpretable Multimodal Attention Fusion for Cancer Survival Prediction
Aniek Eijpe
Soufyan Lakbir
Melis Erdal Cesur
Sara P. Oliveira
Sanne Abeln
Wilson Silva
79
1
0
20 Mar 2025
A Vision Centric Remote Sensing Benchmark
A Vision Centric Remote Sensing Benchmark
Abduljaleel Adejumo
Faegheh Yeganli
Clifford Broni-bediako
Aoran Xiao
Naoto Yokoya
Mennatullah Siam
144
0
0
20 Mar 2025
Cube: A Roblox View of 3D Intelligence
Cube: A Roblox View of 3D Intelligence
Foundation AI Team Roblox
Kiran Bhat
Nishchaie Khanna
Karun Channa
Tinghui Zhou
...
Kyle Price
Steve Han
Yiqing Wang
A. Singh
David Baszucki
133
1
0
19 Mar 2025
TULIP: Towards Unified Language-Image Pretraining
TULIP: Towards Unified Language-Image Pretraining
Zineng Tang
Long Lian
Seun Eisape
Xudong Wang
Roei Herzig
Adam Yala
Alane Suhr
Trevor Darrell
David M. Chan
VLMCLIPMLLM
191
7
0
19 Mar 2025
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Conjuring Positive Pairs for Efficient Unification of Representation Learning and Image Synthesis
Imanol G. Estepa
Jesús M. Rodríguez-de-Vera
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
198
0
0
19 Mar 2025
Distilling 3D distinctive local descriptors for 6D pose estimation
Distilling 3D distinctive local descriptors for 6D pose estimation
Amir Hamza
Andrea Caraffa
Davide Boscaini
Fabio Poiesi
98
1
0
19 Mar 2025
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
RFMI: Estimating Mutual Information on Rectified Flow for Text-to-Image Alignment
Chao Wang
Giulio Franzese
A. Finamore
Pietro Michiardi
244
0
0
18 Mar 2025
E-Values Expand the Scope of Conformal Prediction
E-Values Expand the Scope of Conformal Prediction
Etienne Gauthier
Francis Bach
Michael I. Jordan
107
0
0
17 Mar 2025
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Towards Scalable Foundation Model for Multi-modal and Hyperspectral Geospatial Data
Haozhe Si
Yuxuan Wan
Minh Do
Deepak Vasisht
Han Zhao
Hendrik Hamann
171
0
0
17 Mar 2025
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
DeGauss: Dynamic-Static Decomposition with Gaussian Splatting for Distractor-free 3D Reconstruction
Rui Wang
Q. Lohmeyer
Mirko Meboldt
Siyu Tang
3DGS
106
1
0
17 Mar 2025
8-Calves Image dataset
8-Calves Image dataset
Xuyang Fang
S. Hannuna
Neill D. F. Campbell
404
0
0
17 Mar 2025
An interpretable approach to automating the assessment of biofouling in video footage
An interpretable approach to automating the assessment of biofouling in video footage
Evelyn J. Mannix
Bartholomew A. Woodham
197
0
0
17 Mar 2025
Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey
Learning-based 3D Reconstruction in Autonomous Driving: A Comprehensive Survey
Liewen Liao
Weihao Yan
Ming Yang
Songan Zhang
3DV
184
0
0
17 Mar 2025
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
VTON 360: High-Fidelity Virtual Try-On from Any Viewing Direction
Zijian He
Yuwei Ning
Yipeng Qin
Wangrun Wang
Sibei Yang
Liang Lin
G. Li
190
2
0
15 Mar 2025
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting
Di Li
Jie Feng
Jiahao Chen
Weisheng Dong
Guanbin Li
G. Shi
Licheng Jiao
3DGSVLM
433
0
0
14 Mar 2025
AugGen: Synthetic Augmentation Can Improve Discriminative Models
Parsa Rahimi
Damien Teney
S´ebastien Marcel
132
2
0
14 Mar 2025
APLA: A Simple Adaptation Method for Vision Transformers
APLA: A Simple Adaptation Method for Vision Transformers
Moein Sorkhei
Emir Konuk
Kevin Smith
Christos Matsoukas
133
0
0
14 Mar 2025
Towards a Unified Copernicus Foundation Model for Earth Vision
Towards a Unified Copernicus Foundation Model for Earth Vision
Yi Wang
Zhitong Xiong
Chenying Liu
Adam J. Stewart
Thomas Dujardin
...
Angelos Zavras
Franziska Gerken
Ioannis Papoutsis
Laura Leal-Taixé
Xiao Xiang Zhu
120
4
0
14 Mar 2025
Observation-Graph Interaction and Key-Detail Guidance for Vision and Language Navigation
Yifan Xie
Binkai Ou
Fei Ma
Yaohua Liu
70
0
0
14 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
Heikki Kälviäinen
SSL
494
0
0
14 Mar 2025
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
HybridVLA: Collaborative Diffusion and Autoregression in a Unified Vision-Language-Action Model
Jiaming Liu
Hao Chen
Pengju An
Zhuoyang Liu
Renrui Zhang
...
Chengkai Hou
Mengdi Zhao
KC alex Zhou
Pheng-Ann Heng
Shanghang Zhang
185
20
0
13 Mar 2025
One-Shot Federated Unsupervised Domain Adaptation with Scaled Entropy Attention and Multi-Source Smoothed Pseudo Labeling
Ali Abedi
Qiang Wu
Ning Zhang
Farhad Pourpanah
FedML
118
0
0
13 Mar 2025
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
SmartWay: Enhanced Waypoint Prediction and Backtracking for Zero-Shot Vision-and-Language Navigation
Xiangyu Shi
Zerui Li
Wenqi Lyu
Jiatong Xia
Feras Dayoub
Yanyuan Qiao
Qi Wu
164
1
0
13 Mar 2025
FlowTok: Flowing Seamlessly Across Text and Image Tokens
FlowTok: Flowing Seamlessly Across Text and Image Tokens
Ju He
Qihang Yu
Qihao Liu
Liang-Chieh Chen
148
1
0
13 Mar 2025
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen
Shuchen Xue
Yuyang Zhao
Jincheng Yu
Sayak Paul
Junyu Chen
Han Cai
Enze Xie
Enze Xie
VLM
177
10
0
12 Mar 2025
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Stefan Sylvius Wagner
Stefan Harmeling
OCL
137
1
0
12 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
86
0
0
12 Mar 2025
Previous
123...678...151617
Next