ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision
v1v2 (latest)

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLMCLIPSSL
ArXiv (abs)PDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 826 papers shown
Title
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
Daniel A. P. Oliveira
David Martins de Matos
VGen
71
0
0
15 May 2025
Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field
Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field
Jinlong Fan
Xuepu Zeng
Jing Zhang
Mingming Gong
Yuxiang Yang
Dacheng Tao
3DGSAI4CE
147
0
0
15 May 2025
Modeling Saliency Dataset Bias
Modeling Saliency Dataset Bias
Matthias Kümmerer
Harneet Khanuja
Matthias Bethge
92
0
0
15 May 2025
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Hu Yue
Siyuan Huang
Yue Liao
Shengcong Chen
Pengfei Zhou
Liliang Chen
Maoqing Yao
Guanghui Ren
VGen
82
1
0
14 May 2025
Few-Shot Learning of Visual Compositional Concepts through Probabilistic Schema Induction
Few-Shot Learning of Visual Compositional Concepts through Probabilistic Schema Induction
Andrew Jun Lee
Taylor Webb
Trevor Bihl
K. Holyoak
Hongjing Lu
OCL
63
0
0
14 May 2025
VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation
VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation
Badhan Kumar Das
Ajay Singh
Gengyan Zhao
Han Liu
Thomas J. Re
Dorin Comaniciu
Eli Gibson
Andreas Maier
ViTMedIm
67
0
0
13 May 2025
DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art
DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art
Haroon Wahab
Hassan Ugail
Irfan Mehmood
AAML
51
0
0
13 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
151
0
0
13 May 2025
Multimodal Survival Modeling in the Age of Foundation Models
Multimodal Survival Modeling in the Age of Foundation Models
Steven Song
Morgan Borjigin-Wang
Irene Madejski
Robert L. Grossman
101
0
0
12 May 2025
H$^3$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
H3^33DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
Xinyu Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
98
1
0
12 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
78
0
0
10 May 2025
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
Qingwen Bu
Yanting Yang
Jisong Cai
Shenyuan Gao
Guanghui Ren
Maoqing Yao
Ping Luo
Hongyang Li
423
10
0
09 May 2025
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Alexander Lappe
M. Giese
60
1
0
09 May 2025
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
Weihong Li
Xiaoqiong Liu
Heng Fan
L. Zhang
64
0
0
09 May 2025
Learning to Drive Anywhere with Model-Based Reannotation
Learning to Drive Anywhere with Model-Based Reannotation
Noriaki Hirose
Lydia Ignatova
Kyle Stachowicz
Catherine Glossop
Sergey Levine
Dhruv Shah
79
1
0
08 May 2025
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao
Amy Lin
Jeff Tan
Jason Y. Zhang
Deva Ramanan
Shubham Tulsiani
VGen
175
1
0
08 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Yulin Chen
Zhuotao Tian
VLM
102
0
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffMVGen
191
6
0
07 May 2025
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection
Zhihao Zhang
Abhinav Kumar
Girish Chandar Ganesan
Xiaoming Liu
547
0
0
07 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
169
0
0
06 May 2025
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu
Yuxin Peng
91
0
0
06 May 2025
A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation
A Unit Enhancement and Guidance Framework for Audio-Driven Avatar Video Generation
Y.B. Wang
S.Z. Zhou
J.F. Wu
T. Hu
J.N. Zhang
DiffMVGen
130
0
0
06 May 2025
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Nikita Ravi
Abhinav Goel
James C. Davis
George K. Thiruvathukal
91
0
0
06 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
Utsav Nareti
S. Chattopadhyay
Prolay Mallick
Suraj Kumar
Ayush Vikas Daga
Chandranath Adak
Adarsh Wase
Arjab Roy
161
1
0
05 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
Dengyang Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
107
0
0
05 May 2025
Always Skip Attention
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
453
3
0
04 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
495
0
0
04 May 2025
Contextures: Representations from Contexts
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
447
0
0
02 May 2025
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Gaozheng Pei
Ke Ma
Yingfei Sun
Qianqian Xu
Qingming Huang
DiffM
84
0
0
02 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
Lefei Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAMLVLM
130
2
0
02 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
Jieneng Chen
LRM
118
1
0
01 May 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Chong Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRLLRM
206
8
0
30 Apr 2025
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
Florian Vahl
Jörn Griepenburg
Jan Gutsche
Jasper Güldenstein
Jianwei Zhang
VGen
104
0
0
29 Apr 2025
Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics
Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics
Andre Schreiber
Katherine Rose Driggs-Campbell
475
0
0
28 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
149
0
0
28 Apr 2025
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation
Victoria Yue Chen
Daoye Wang
Stephan Garbin
Jan Bednarík
Sebastian Winberg
Timo Bolkart
Thabo Beeler
3DH3DPC
120
0
0
28 Apr 2025
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
Xueqi Ma
Yong Liu
Tianlong Gao
Qingming Huang
Hui Huang
3DVAI4CE
141
0
0
27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
Siyang Song
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
108
0
0
27 Apr 2025
Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
Rezowan Shuvo
M S Mekala
Eyad Elyan
MedIm
425
0
0
26 Apr 2025
The Fourth Monocular Depth Estimation Challenge
The Fourth Monocular Depth Estimation Challenge
Anton Obukhov
Matteo Poggi
Fabio Tosi
Ripudaman Singh Arora
Jaime Spencer
...
Tuan-Anh Yang
Minh-Quang Nguyen
T. Tran
Albert Luginov
Muhammad Shahzad
MDE
450
1
0
24 Apr 2025
Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward
Unveiling the Lack of LVLM Robustness to Fundamental Visual Variations: Why and Path Forward
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi R. Fung
135
0
0
23 Apr 2025
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Zehao Wang
Senthil Purushwalkam
Caiming Xiong
Siyang Song
Chenhui Xu
Ran Xu
171
2
0
23 Apr 2025
DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining
DINOv2-powered Few-Shot Semantic Segmentation: A Unified Framework via Cross-Model Distillation and 4D Correlation Mining
Wei Zhuo
Zhiyue Tang
Wufeng Xue
Hao Ding
Linlin Shen
111
0
0
22 Apr 2025
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
FreeGraftor: Training-Free Cross-Image Feature Grafting for Subject-Driven Text-to-Image Generation
Zebin Yao
Lujie Niu
Huixing Jiang
Chen Wei
Fangkun Zhao
Ruifan Li
Fangxiang Feng
DiffM
181
0
0
22 Apr 2025
Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation
Instance-Adaptive Keypoint Learning with Local-to-Global Geometric Aggregation for Category-Level Object Pose Estimation
Wei Wei
Lu Zou
Tao Lu
Yuan Yao
Zhangjin Huang
Guoping Wang
3DPC
122
0
0
21 Apr 2025
MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation
MonoTher-Depth: Enhancing Thermal Depth Estimation via Confidence-Aware Distillation
Xingxing Zuo
Nikhil Ranganathan
Connor T. Lee
Georgia Gkioxari
Soon-Jo Chung
VLM
170
2
0
21 Apr 2025
Context Aware Grounded Teacher for Source Free Object Detection
Context Aware Grounded Teacher for Source Free Object Detection
Tajamul Ashraf
Rajes Manna
Partha Sarathi Purkayastha
Tavaheed Tariq
Janibul Bashir
103
0
0
21 Apr 2025
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Video-MMLU: A Massive Multi-Discipline Lecture Understanding Benchmark
Enxin Song
Wenhao Chai
Weili Xu
Jianwen Xie
Yuxuan Liu
Gaoang Wang
122
6
0
20 Apr 2025
Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis
Learning Joint ID-Textual Representation for ID-Preserving Image Synthesis
Zichuan Liu
Liming Jiang
Qing Yan
Yumin Jia
Hao Kang
Xin Lu
DiffM
142
0
0
19 Apr 2025
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Locate 3D: Real-World Object Localization via Self-Supervised Learning in 3D
Sergio Arnaud
Paul Mcvay
Ada Martin
Arjun Majumdar
Krishna Murthy Jatavallabhula
...
Nicolas Ballas
Mido Assran
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
3DPC
81
2
0
19 Apr 2025
Previous
123456...151617
Next