ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision
v1v2 (latest)

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLMCLIPSSL
ArXiv (abs)PDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 826 papers shown
Title
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
SPA: 3D Spatial-Awareness Enables Effective Embodied Representation
Haoyi Zhu
Honghui Yang
Yating Wang
Jiange Yang
Limin Wang
Tong He
3DH
126
9
0
10 Oct 2024
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
AvatarGO: Zero-shot 4D Human-Object Interaction Generation and Animation
Yukang Cao
Liang Pan
Kai Han
Kwan-Yee K. Wong
Ziwei Liu
VGen
129
6
0
09 Oct 2024
Towards Generalisable Time Series Understanding Across Domains
Towards Generalisable Time Series Understanding Across Domains
Özgün Turgut
Philip Muller
Martin J. Menten
Daniel Rueckert
AI4TS
133
3
0
09 Oct 2024
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Representation Alignment for Generation: Training Diffusion Transformers Is Easier Than You Think
Sihyun Yu
Sangkyung Kwak
Huiwon Jang
Jongheon Jeong
Jonathan Huang
Jinwoo Shin
Saining Xie
OCL
184
102
0
09 Oct 2024
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers
Pair-VPR: Place-Aware Pre-training and Contrastive Pair Classification for Visual Place Recognition with Vision Transformers
Stephen Hausler
Peyman Moghadam
SSLViT
68
4
0
09 Oct 2024
NegMerge: Sign-Consensual Weight Merging for Machine Unlearning
NegMerge: Sign-Consensual Weight Merging for Machine Unlearning
Hyoseo Kim
Dongyoon Han
Junsuk Choe
MUMoMe
76
3
0
08 Oct 2024
PhotoReg: Photometrically Registering 3D Gaussian Splatting Models
PhotoReg: Photometrically Registering 3D Gaussian Splatting Models
Ziwen Yuan
Tianyi Zhang
Matthew Johnson-Roberson
Weiming Zhi
3DGS
53
3
0
07 Oct 2024
Organizing Unstructured Image Collections using Natural Language
Organizing Unstructured Image Collections using Natural Language
Mingxuan Liu
Zhun Zhong
Jun Li
Gianni Franchi
Subhankar Roy
Elisa Ricci
VLM
141
5
0
07 Oct 2024
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise
Image Watermarks are Removable Using Controllable Regeneration from Clean Noise
Yepeng Liu
Yiren Song
Hai Ci
Yu Zhang
Haofan Wang
Mike Zheng Shou
Yuheng Bu
WIGM
118
7
0
07 Oct 2024
Control-oriented Clustering of Visual Latent Representation
Control-oriented Clustering of Visual Latent Representation
Han Qi
Haocheng Yin
Heng Yang
SSL
143
2
0
07 Oct 2024
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Brain Mapping with Dense Features: Grounding Cortical Semantic Selectivity in Natural Images With Vision Transformers
Andrew F. Luo
Jacob Yeung
Rushikesh Zawar
Shaurya Dewan
Margaret M. Henderson
Leila Wehbe
Michael J. Tarr
105
5
0
07 Oct 2024
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
On the Adversarial Risk of Test Time Adaptation: An Investigation into Realistic Test-Time Data Poisoning
Yongyi Su
Yushu Li
Nanqing Liu
Kui Jia
Xulei Yang
Chuan-Sheng Foo
Xun Xu
TTAAAML
161
1
0
07 Oct 2024
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
AuroraCap: Efficient, Performant Video Detailed Captioning and a New Benchmark
Wenhao Chai
Enxin Song
Y. Du
Chenlin Meng
Vashisht Madhavan
Omer Bar-Tal
Jeng-Neng Hwang
Saining Xie
Christopher D. Manning
3DV
219
37
0
04 Oct 2024
ControlAR: Controllable Image Generation with Autoregressive Models
ControlAR: Controllable Image Generation with Autoregressive Models
Zongming Li
Tianheng Cheng
Shoufa Chen
Peize Sun
Haocheng Shen
Longjin Ran
Xiaoxin Chen
Wenyu Liu
Xinggang Wang
DiffM
246
19
0
03 Oct 2024
OmniSR: Shadow Removal under Direct and Indirect Lighting
OmniSR: Shadow Removal under Direct and Indirect Lighting
Jiamin Xu
Zelong Li
Yuxin Zheng
Chenyu Huang
Renshu Gu
Weiwei Xu
Gang Xu
3DV
177
2
0
02 Oct 2024
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
EMMA: Efficient Visual Alignment in Multi-Modal LLMs
Sara Ghazanfari
Alexandre Araujo
Prashanth Krishnamurthy
Siddharth Garg
Farshad Khorrami
VLM
81
2
0
02 Oct 2024
Arges: Spatio-Temporal Transformer for Ulcerative Colitis Severity
  Assessment in Endoscopy Videos
Arges: Spatio-Temporal Transformer for Ulcerative Colitis Severity Assessment in Endoscopy Videos
Krishna Chaitanya
Pablo F. Damasceno
Shreyas Fadnavis
Pooya Mobadersany
Chaitanya Parmar
...
Lindsey Surace
Louis R. Ghanem
Oana Gabriela Cula
Tommaso Mansi
K. Standish
59
0
0
01 Oct 2024
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
iTeach: Interactive Teaching for Robot Perception using Mixed Reality
Jishnu Jaykumar P
Cole Salvato
Vinaya Bomnale
Jikai Wang
Yu Xiang
123
0
0
01 Oct 2024
Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation
Flex3D: Feed-Forward 3D Generation with Flexible Reconstruction Model and Input View Curation
Junlin Han
Jianyuan Wang
Andrea Vedaldi
Philip Torr
Filippos Kokkinos
124
4
0
01 Oct 2024
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization
Towards Effective Utilization of Mixed-Quality Demonstrations in Robotic Manipulation via Segment-Level Selection and Optimization
Jingjing Chen
Hongjie Fang
Hao-Shu Fang
Cewu Lu
113
2
0
30 Sep 2024
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
UniAff: A Unified Representation of Affordances for Tool Usage and Articulation with Vision-Language Models
Qiaojun Yu
Siyuan Huang
Xibin Yuan
Zhengkai Jiang
Ce Hao
...
Junbo Wang
Liu Liu
Hongsheng Li
Peng Gao
Cewu Lu
130
3
0
30 Sep 2024
A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping
A Parameter-Efficient Tuning Framework for Language-guided Object Grounding and Robot Grasping
Houjian Yu
Mingen Li
Alireza Rezazadeh
Yang Yang
Changhyun Choi
107
2
0
28 Sep 2024
Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning
Canonical Representation and Force-Based Pretraining of 3D Tactile for Dexterous Visuo-Tactile Policy Learning
Tianhao Wu
Jinzhou Li
Jiyao Zhang
Mingdong Wu
Hao Dong
SSL
119
8
0
26 Sep 2024
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Multi-View and Multi-Scale Alignment for Contrastive Language-Image Pre-training in Mammography
Yuexi Du
John Onofrey
Nicha Dvornek
VLM
110
2
0
26 Sep 2024
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
EMOVA: Empowering Language Models to See, Hear and Speak with Vivid Emotions
Kai Chen
Yunhao Gou
Runhui Huang
Zhili Liu
Daxin Tan
...
Qun Liu
Jun Yao
Lu Hou
Hang Xu
Hang Xu
AuLLMMLLMVLM
181
29
0
26 Sep 2024
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
SECURE: Semantics-aware Embodied Conversation under Unawareness for Lifelong Robot Learning
Rimvydas Rubavicius
Peter David Fagan
A. Lascarides
Subramanian Ramamoorthy
LM&Ro
446
0
0
26 Sep 2024
Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms
Robust Scene Change Detection Using Visual Foundation Models and Cross-Attention Mechanisms
Chun-Jung Lin
Sourav Garg
Tat-Jun Chin
Feras Dayoub
64
2
0
25 Sep 2024
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
Mitigating Covariate Shift in Imitation Learning for Autonomous Vehicles Using Latent Space Generative World Models
A. Popov
Alperen Degirmenci
David Wehr
Shashank Hegde
Ryan Oldja
...
David Nistér
Urs Muller
Ruchi Bhargava
Stan Birchfield
Nikolai Smolyanskiy
153
11
0
25 Sep 2024
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
GeoBiked: A Dataset with Geometric Features and Automated Labeling Techniques to Enable Deep Generative Models in Engineering Design
Phillip Mueller
Sebastian Mueller
Lars Mikelsons
112
2
0
25 Sep 2024
RTAGrasp: Learning Task-Oriented Grasping from Human Videos via
  Retrieval, Transfer, and Alignment
RTAGrasp: Learning Task-Oriented Grasping from Human Videos via Retrieval, Transfer, and Alignment
Wenlong Dong
Dehao Huang
Jiangshan Liu
Chao Tang
Hong Zhang
80
4
0
24 Sep 2024
OW-Rep: Open World Object Detection with Instance Representation Learning
OW-Rep: Open World Object Detection with Instance Representation Learning
Sunoh Lee
Minsik Jeon
Jihong Min
Junwon Seo
ObjD
488
0
0
24 Sep 2024
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
SDFit: 3D Object Pose and Shape by Fitting a Morphable SDF to a Single Image
Dimitrije Antić
Sai Kumar Dwivedi
Shashank Tripathi
Theo Gevers
Dimitrios Tzionas
Dimitrios Tzionas
174
2
0
24 Sep 2024
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
ReVLA: Reverting Visual Domain Limitation of Robotic Foundation Models
Sombit Dey
Jan-Nico Zaech
Nikolay Nikolov
Luc Van Gool
Danda Pani Paudel
MoMeVLM
151
5
0
23 Sep 2024
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
MaPPER: Multimodal Prior-guided Parameter Efficient Tuning for Referring Expression Comprehension
Ting Liu
Zunnan Xu
Yue Hu
Liangtao Shi
Zhiqiang Wang
Quanjun Yin
153
3
0
20 Sep 2024
HMD^2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device
HMD^2: Environment-aware Motion Generation from Single Egocentric Head-Mounted Device
Vladimir Guzov
Yifeng Jiang
Fangzhou Hong
Gerard Pons-Moll
Richard Newcombe
C. Karen Liu
Yuting Ye
Lingni Ma
85
5
0
20 Sep 2024
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
3DTopia-XL: Scaling High-quality 3D Asset Generation via Primitive Diffusion
Zhaoxi Chen
Jiaxiang Tang
Yuhao Dong
Ziang Cao
Fangzhou Hong
...
Tong Wu
Shunsuke Saito
Liang Pan
Dahua Lin
Ziwei Liu
133
23
0
19 Sep 2024
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Towards Global Localization using Multi-Modal Object-Instance Re-Identification
Aneesh Chavan
Vaibhav Agrawal
Vineeth Bhat
Sarthak Chittawar
Siddharth Srivastava
Chetan Arora
K. M. Krishna
155
0
0
18 Sep 2024
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition
IMRL: Integrating Visual, Physical, Temporal, and Geometric Representations for Enhanced Food Acquisition
Rui Liu
Zahiruddin Mahammad
Amisha Bhaskar
Pratap Tokekar
88
2
0
18 Sep 2024
OSV: One Step is Enough for High-Quality Image to Video Generation
OSV: One Step is Enough for High-Quality Image to Video Generation
Xiaofeng Mao
Zhengkai Jiang
Fu-Yun Wang
Wenbing Zhu
Hao Chen
Mingmin Chi
Yabiao Wang
Wenhan Luo
DiffMVGen
129
13
0
17 Sep 2024
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Fine-Tuning Image-Conditional Diffusion Models is Easier than You Think
Gonzalo Martin Garcia
Karim Abou Zeid
Christian Schmidt
Daan de Geus
Alexander Hermans
Bastian Leibe
133
33
0
17 Sep 2024
Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation
Towards Real-Time Generation of Delay-Compensated Video Feeds for Outdoor Mobile Robot Teleoperation
Neeloy Chakraborty
Yixiao Fang
Andre Schreiber
Tianchen Ji
Zhe Huang
Aganze Mihigo
Cassidy Wall
Abdulrahman Almana
Katherine Driggs-Campbell
126
0
0
16 Sep 2024
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Frequency-Guided Masking for Enhanced Vision Self-Supervised Learning
Amin Karimi Monsefi
Mengxi Zhou
Nastaran Karimi Monsefi
Ser-Nam Lim
Wei-Lun Chao
R. Ramnath
132
1
0
16 Sep 2024
Robust image representations with counterfactual contrastive learning
Robust image representations with counterfactual contrastive learning
Mélanie Roschewitz
Fabio De Sousa Ribeiro
Tian Xia
G. Khara
Ben Glocker
OODMedIm
140
2
0
16 Sep 2024
One missing piece in Vision and Language: A Survey on Comics Understanding
One missing piece in Vision and Language: A Survey on Comics Understanding
Emanuele Vivoli
Andrey Barsky
Mohamed Ali Souibgui
Artemis LLabres
Marco Bertini
Dimosthenis Karatzas
124
5
0
14 Sep 2024
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval
Evaluating Pre-trained Convolutional Neural Networks and Foundation Models as Feature Extractors for Content-based Medical Image Retrieval
Amirreza Mahbod
Nematollah Saeidi
Sepideh Hatamikia
Ramona Woitek
VLMMedIm
126
4
0
14 Sep 2024
GroundingBooth: Grounding Text-to-Image Customization
GroundingBooth: Grounding Text-to-Image Customization
Zhexiao Xiong
Wei Xiong
Jing Shi
He Zhang
Yizhi Song
Nathan Jacobs
DiffM
156
9
0
13 Sep 2024
Autoregressive Sequence Modeling for 3D Medical Image Representation
Autoregressive Sequence Modeling for 3D Medical Image Representation
Siwen Wang
Churan Wang
Fei Gao
Lixian Su
Fandong Zhang
Yizhou Wang
Yizhou Yu
MedIm
127
1
0
13 Sep 2024
ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation
ClearDepth: Enhanced Stereo Perception of Transparent Objects for Robotic Manipulation
Kaixin Bai
Huajian Zeng
Lei Zhang
Yiwen Liu
Hongli Xu
Zhaopeng Chen
Jianwei Zhang
74
1
0
13 Sep 2024
Foundation Models Boost Low-Level Perceptual Similarity Metrics
Foundation Models Boost Low-Level Perceptual Similarity Metrics
Abhijay Ghildyal
Nabajeet Barman
Saman Zadtootaghaj
99
4
0
11 Sep 2024
What to align in multimodal contrastive learning?
What to align in multimodal contrastive learning?
Benoit Dufumier
J. Castillo-Navarro
D. Tuia
Jean-Philippe Thiran
156
4
0
11 Sep 2024
Previous
123...111213...151617
Next