ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLM
    CLIP
    SSL
ArXivPDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 2,189 papers shown
Title
Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field
Advances in Radiance Field for Dynamic Scene: From Neural Field to Gaussian Field
Jinlong Fan
Xuepu Zeng
J. Zhang
M. Gong
Yuxiang Yang
Dacheng Tao
3DGS
AI4CE
36
0
0
15 May 2025
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
StoryReasoning Dataset: Using Chain-of-Thought for Scene Understanding and Grounded Story Generation
Daniel A. P. Oliveira
D. Matos
VGen
24
0
0
15 May 2025
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
EmbodiedMAE: A Unified 3D Multi-Modal Representation for Robot Manipulation
Zibin Dong
Fei Ni
Yifu Yuan
Yinchuan Li
Jianye Hao
24
0
0
15 May 2025
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Does Feasibility Matter? Understanding the Impact of Feasibility on Synthetic Training Data
Yiwen Liu
Jessica Bader
Jae Myung Kim
DiffM
16
0
0
15 May 2025
Modeling Saliency Dataset Bias
Modeling Saliency Dataset Bias
Matthias Kümmerer
Harneet Khanuja
Matthias Bethge
19
0
0
15 May 2025
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Flash-VL 2B: Optimizing Vision-Language Model Performance for Ultra-Low Latency and High Throughput
Bo Zhang
Shuo Li
Runhe Tian
Yang Yang
Jixin Tang
Jinhao Zhou
Lin Ma
VLM
22
0
0
14 May 2025
EWMBench: Evaluating Scene, Motion, and Semantic Quality in Embodied World Models
Hu Yue
Siyuan Huang
Yue Liao
Shengcong Chen
Pengfei Zhou
Liliang Chen
Maoqing Yao
Guanghui Ren
VGen
29
0
0
14 May 2025
Few-Shot Learning of Visual Compositional Concepts through Probabilistic Schema Induction
Few-Shot Learning of Visual Compositional Concepts through Probabilistic Schema Induction
Andrew Jun Lee
Taylor W. Webb
Trevor J. Bihl
K. Holyoak
Hongjing Lu
OCL
28
0
0
14 May 2025
Marigold: Affordable Adaptation of Diffusion-Based Image Generators for Image Analysis
B. Ke
Kevin Qu
T. Wang
Nando Metzger
Shengyu Huang
Bo Li
Anton Obukhov
Konrad Schindler
DiffM
VLM
20
0
0
14 May 2025
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis
BioVFM-21M: Benchmarking and Scaling Self-Supervised Vision Foundation Models for Biomedical Image Analysis
Jiarun Liu
Hong-Yu Zhou
Weijian Huang
Hao Yang
Dongning Song
Tao Tan
Yong Liang
Shanshan Wang
MedIm
23
0
0
14 May 2025
RT-cache: Efficient Robot Trajectory Retrieval System
RT-cache: Efficient Robot Trajectory Retrieval System
Owen Kwon
Abraham George
Alison Bartsch
A. Farimani
10
0
0
14 May 2025
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
DataMIL: Selecting Data for Robot Imitation Learning with Datamodels
S. Dass
Alaa Khaddaj
Logan Engstrom
Aleksander Madry
Andrew Ilyas
Roberto Martin-Martin
18
0
0
14 May 2025
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Endo-CLIP: Progressive Self-Supervised Pre-training on Raw Colonoscopy Records
Yili He
Yan Zhu
Peiyao Fu
Ruijie Yang
Tianyi Chen
Zhihua Wang
Quanlin Li
Pinghong Zhou
X. J. Yang
Shuo Wang
MedIm
VLM
23
0
0
14 May 2025
Enhancing Thyroid Cytology Diagnosis with RAG-Optimized LLMs and Pa-thology Foundation Models
Enhancing Thyroid Cytology Diagnosis with RAG-Optimized LLMs and Pa-thology Foundation Models
Hussien Al-Asi
Jordan P Reynolds
Shweta Agarwal
Bryan J Dangott
Aziza Nassar
Zeynettin Akkus
LM&MA
37
0
0
13 May 2025
DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art
DFA-CON: A Contrastive Learning Approach for Detecting Copyright Infringement in DeepFake Art
Haroon Wahab
Hassan Ugail
Irfan Mehmood
AAML
29
0
0
13 May 2025
VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation
VIViT: Variable-Input Vision Transformer Framework for 3D MR Image Segmentation
B. K. Das
Ajay Singh
Gengyan Zhao
Han Liu
Thomas J. Re
D. Comaniciu
Eli Gibson
Andreas K. Maier
ViT
MedIm
26
0
0
13 May 2025
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
From Seeing to Doing: Bridging Reasoning and Decision for Robotic Manipulation
Yifu Yuan
Haiqin Cui
Yibin Chen
Zibin Dong
Fei Ni
Longxin Kou
Jinyi Liu
Pengyi Li
Yan Zheng
Jianye Hao
26
0
0
13 May 2025
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Extending Large Vision-Language Model for Diverse Interactive Tasks in Autonomous Driving
Zongchuang Zhao
Haoyu Fu
Dingkang Liang
Xin Zhou
Dingyuan Zhang
Hongwei Xie
Bing Wang
Xiang Bai
MLLM
VLM
46
0
0
13 May 2025
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Discovering Fine-Grained Visual-Concept Relations by Disentangled Optimal Transport Concept Bottleneck Models
Yan Xie
Zequn Zeng
Hao Zhang
Yucheng Ding
Y. Wang
Zhengjue Wang
Bo Chen
Hongwei Liu
OT
31
0
0
12 May 2025
Synthetic Similarity Search in Automotive Production
Synthetic Similarity Search in Automotive Production
Christoph Huber
Ludwig Schleeh
Dino Knoll
Michael Guthe
31
0
0
12 May 2025
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Step1X-3D: Towards High-Fidelity and Controllable Generation of Textured 3D Assets
Weiyu Li
X. Zhang
Zheng Sun
Di Qi
H. Li
...
Zeming Li
Gang Yu
Xiangyu Zhang
Daxin Jiang
Ping Tan
36
0
0
12 May 2025
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
Vision Foundation Model Embedding-Based Semantic Anomaly Detection
M. Ronecker
Matthew Foutter
Amine Elhafsi
Daniele Gammelli
Ihor Barakaiev
Marco Pavone
Daniel Watzenig
29
0
0
12 May 2025
H$^{\mathbf{3}}$DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
H3^{\mathbf{3}}3DP: Triply-Hierarchical Diffusion Policy for Visuomotor Learning
Yiyang Lu
Yufeng Tian
Zhecheng Yuan
X. Wang
Pu Hua
Zhengrong Xue
Huazhe Xu
28
0
0
12 May 2025
Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding
Beyond Static Perception: Integrating Temporal Context into VLMs for Cloth Folding
Oriol Barbany
Adria Colomé
Carme Torras
33
0
0
12 May 2025
Multimodal Survival Modeling in the Age of Foundation Models
Multimodal Survival Modeling in the Age of Foundation Models
Steven Song
Morgan Borjigin-Wang
Irene Madejski
Robert L. Grossman
21
0
0
12 May 2025
Hand-Shadow Poser
Hand-Shadow Poser
Hao Xu
Yinqiao Wang
Niloy J. Mitra
Shuaicheng Liu
Pheng-Ann Heng
Chi-Wing Fu
3DH
29
0
0
11 May 2025
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
SimMIL: A Universal Weakly Supervised Pre-Training Framework for Multi-Instance Learning in Whole Slide Pathology Images
Yicheng Song
Tiancheng Lin
Die Peng
Su Yang
Yi Xu
MedIm
31
0
0
10 May 2025
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
DiffLocks: Generating 3D Hair from a Single Image using Diffusion Models
Radu Alexandru Rosu
Keyu Wu
Yao Feng
Youyi Zheng
M. Black
DiffM
3DH
47
0
0
09 May 2025
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
UniVLA: Learning to Act Anywhere with Task-centric Latent Actions
Qingwen Bu
Y. Yang
Jisong Cai
Shenyuan Gao
Guanghui Ren
Maoqing Yao
Ping Luo
Hongyang Li
98
0
0
09 May 2025
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Towards a Unified Representation Evaluation Framework Beyond Downstream Tasks
Christos Plachouras
Julien Guinot
George Fazekas
Elio Quinton
Emmanouil Benetos
Johan Pauwels
110
1
0
09 May 2025
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
3D CAVLA: Leveraging Depth and 3D Context to Generalize Vision Language Action Models for Unseen Tasks
V. Bhat
Yu-Hsiang Lan
P. Krishnamurthy
Ramesh Karri
Farshad Khorrami
52
0
0
09 May 2025
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Register and CLS tokens yield a decoupling of local and global features in large ViTs
Alexander Lappe
M. Giese
24
0
0
09 May 2025
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
CGTrack: Cascade Gating Network with Hierarchical Feature Aggregation for UAV Tracking
Weihong Li
Xiaoqiong Liu
Heng Fan
L. Zhang
26
0
0
09 May 2025
Learning to Drive Anywhere with Model-Based Reannotation
Learning to Drive Anywhere with Model-Based Reannotation
Noriaki Hirose
Lydia Ignatova
Kyle Stachowicz
Catherine Glossop
Sergey Levine
Dhruv Shah
24
0
0
08 May 2025
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
SVAD: From Single Image to 3D Avatar via Synthetic Data Generation with Video Diffusion and Data Augmentation
Yonwoo Choi
3DGS
VGen
62
0
0
08 May 2025
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
DiffusionSfM: Predicting Structure and Motion via Ray Origin and Endpoint Diffusion
Qitao Zhao
Amy Lin
Jeff Tan
Jason Y. Zhang
Deva Ramanan
Shubham Tulsiani
VGen
48
0
0
08 May 2025
One2Any: One-Reference 6D Pose Estimation for Any Object
One2Any: One-Reference 6D Pose Estimation for Any Object
Mengya Liu
Siyuan Li
Ajad Chhatkuli
Prune Truong
Luc Van Gool
Federico Tombari
37
0
0
07 May 2025
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
DeCLIP: Decoupled Learning for Open-Vocabulary Dense Perception
Junjie Wang
Bin Chen
Yulin Li
Bin Kang
Y. Chen
Zhuotao Tian
VLM
38
0
0
07 May 2025
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Merging and Disentangling Views in Visual Reinforcement Learning for Robotic Manipulation
Abdulaziz Almuzairee
Rohan Patil
Dwait Bhatt
Henrik I. Christensen
34
0
0
07 May 2025
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Person Recognition at Altitude and Range: Fusion of Face, Body Shape and Gait
Feng Liu
Nicholas Chimitt
Lanqing guo
Jitesh Jain
Aditya Kane
...
Arun Ross
Humphrey Shi
Zhangyang Wang
A. Jain
Xiaoming Liu
CVBM
27
1
0
07 May 2025
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
HunyuanCustom: A Multimodal-Driven Architecture for Customized Video Generation
Teng Hu
Zhentao Yu
Zhengguang Zhou
Sen Liang
Yuan Zhou
Qin Lin
Qinglin Lu
DiffM
VGen
52
0
0
07 May 2025
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
MeshGen: Generating PBR Textured Mesh with Render-Enhanced Auto-Encoder and Generative Data Augmentation
Zilong Chen
Yikai Wang
Wenqiang Sun
Feng Wang
Yiwen Chen
Huaping Liu
31
0
0
07 May 2025
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection
MonoCoP: Chain-of-Prediction for Monocular 3D Object Detection
Zhihao Zhang
Abhinav Kumar
Girish Chandar Ganesan
Xiaoming Liu
139
0
0
07 May 2025
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Vision-Language-Action Models: Concepts, Progress, Applications and Challenges
Ranjan Sapkota
Yang Cao
Konstantinos I Roumeliotis
Manoj Karkee
LM&Ro
143
1
0
07 May 2025
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
Haoyu Bai
Jie Wang
Gaomin Li
X. Li
Xiaohu Zhang
Xia Yang
27
0
0
06 May 2025
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Nikita Ravi
Abhinav Goel
James C. Davis
George K. Thiruvathukal
46
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
68
0
0
06 May 2025
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
PosterO: Structuring Layout Trees to Enable Language Models in Generalized Content-Aware Layout Generation
HsiaoYuan Hsu
Yuxin Peng
21
0
0
06 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
37
0
0
06 May 2025
Real-Time Person Image Synthesis Using a Flow Matching Model
Real-Time Person Image Synthesis Using a Flow Matching Model
Jiwoo Jeong
Kirok Kim
Wooju Kim
Nam-Joon Kim
3DH
66
0
0
06 May 2025
1234...424344
Next