ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2304.07193
  4. Cited By
DINOv2: Learning Robust Visual Features without Supervision

DINOv2: Learning Robust Visual Features without Supervision

14 April 2023
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
Vasil Khalidov
Pierre Fernandez
Daniel Haziza
Francisco Massa
Alaaeldin El-Nouby
Mahmoud Assran
Nicolas Ballas
Wojciech Galuba
Russ Howes
Po-Yao (Bernie) Huang
Shang-Wen Li
Ishan Misra
Michael G. Rabbat
Vasu Sharma
Gabriel Synnaeve
Huijiao Xu
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
    VLM
    CLIP
    SSL
ArXivPDFHTML

Papers citing "DINOv2: Learning Robust Visual Features without Supervision"

50 / 2,193 papers shown
Title
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model
PAHA: Parts-Aware Audio-Driven Human Animation with Diffusion Model
Y.B. Wang
S.Z. Zhou
J.F. Wu
T. Hu
J.N. Zhang
Z. Li
Yanzhe Liu
DiffM
VGen
67
0
0
06 May 2025
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Show or Tell? A Benchmark To Evaluate Visual and Textual Prompts in Semantic Segmentation
Gabriele Rosi
Fabio Cermelli
VLM
42
0
0
06 May 2025
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Reducing Annotation Burden in Physical Activity Research Using Vision-Language Models
Abram Schonfeldt
Benjamin Maylor
Xiaofang Chen
Ronald Clark
Aiden Doherty
68
0
0
06 May 2025
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
CXR-AD: Component X-ray Image Dataset for Industrial Anomaly Detection
Haoyu Bai
Jie Wang
Gaomin Li
Xianrui Li
Xiaohu Zhang
Xia Yang
41
0
0
06 May 2025
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Improving the Reproducibility of Deep Learning Software: An Initial Investigation through a Case Study Analysis
Nikita Ravi
Abhinav Goel
James C. Davis
George K. Thiruvathukal
51
0
0
06 May 2025
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
An Adaptive Data-Resilient Multi-Modal Framework for Hierarchical Multi-Label Book Genre Identification
Utsav Nareti
S. Chattopadhyay
Prolay Mallick
Suraj Kumar
Ayush Vikas Daga
Chandranath Adak
Adarsh Wase
Arjab Roy
23
0
0
05 May 2025
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
VGLD: Visually-Guided Linguistic Disambiguation for Monocular Depth Scale Recovery
Bojin Wu
Jing Chen
MDE
46
0
0
05 May 2025
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Towards Dataset Copyright Evasion Attack against Personalized Text-to-Image Diffusion Models
Kuofeng Gao
Yufei Zhu
Yiming Li
Jiawang Bai
Yong-Liang Yang
Z. Li
Shu-Tao Xia
41
0
0
05 May 2025
Learning 3D Persistent Embodied World Models
Learning 3D Persistent Embodied World Models
Siyuan Zhou
Yilun Du
Yuncong Yang
Lei Han
Peihao Chen
Dit-Yan Yeung
Chuang Gan
VGen
47
0
0
05 May 2025
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
No Other Representation Component Is Needed: Diffusion Transformers Can Provide Representation Guidance by Themselves
D. Jiang
Mengmeng Wang
Liuzhuozheng Li
Lei Zhang
Haoyu Wang
Wei Wei
Guang Dai
Yanning Zhang
Jingdong Wang
DiffM
51
0
0
05 May 2025
SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting
SparSplat: Fast Multi-View Reconstruction with Generalizable 2D Gaussian Splatting
Shubhendu Jena
Shishir Reddy Vutukur
A. Boukhayma
3DGS
52
1
0
04 May 2025
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Benchmarking Feature Upsampling Methods for Vision Foundation Models using Interactive Segmentation
Volodymyr Havrylov
Haiwen Huang
Dan Zhang
Andreas Geiger
128
0
0
04 May 2025
Always Skip Attention
Always Skip Attention
Yiping Ji
Hemanth Saratchandran
Peyman Moghaddam
Simon Lucey
151
0
0
04 May 2025
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Diffusion-based Adversarial Purification from the Perspective of the Frequency Domain
Gaozheng Pei
Ke Ma
Yingfei Sun
Qianqian Xu
Qingming Huang
DiffM
42
0
0
02 May 2025
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Self-Supervision Enhances Instance-based Multiple Instance Learning Methods in Digital Pathology: A Benchmark Study
Ali Mammadov
Loic Le Folgoc
Julien Adam
Anne Buronfosse
Gilles Hayem
Guillaume Hocquet
Pietro Gori
SSL
45
0
0
02 May 2025
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Transferable Adversarial Attacks on Black-Box Vision-Language Models
Kai Hu
Weichen Yu
L. Zhang
Alexander Robey
Andy Zou
Chengming Xu
Haoqi Hu
Matt Fredrikson
AAML
VLM
64
1
0
02 May 2025
Contextures: Representations from Contexts
Contextures: Representations from Contexts
Runtian Zhai
Kai Yang
Che-Ping Tsai
Burak Varici
Zico Kolter
Pradeep Ravikumar
119
0
0
02 May 2025
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
SpatialLLM: A Compound 3D-Informed Design towards Spatially-Intelligent Large Multimodal Models
Wufei Ma
Luoxin Ye
Nessa McWeeney
Celso M de Melo
A. Yuille
Jieneng Chen
LRM
65
1
0
01 May 2025
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Pixel3DMM: Versatile Screen-Space Priors for Single-Image 3D Face Reconstruction
Simon Giebenhain
Tobias Kirschstein
Martin Rünz
Lourdes Agapito
Matthias Nießner
CVBM
3DH
62
0
0
01 May 2025
Rethinking Visual Layer Selection in Multimodal LLMs
Rethinking Visual Layer Selection in Multimodal LLMs
H. Chen
Junyan Lin
Xinhao Chen
Yue Fan
Xin Jin
Hui Su
Jianfeng Dong
Jinlan Fu
Xiaoyu Shen
VLM
95
0
0
30 Apr 2025
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Reinforced MLLM: A Survey on RL-Based Reasoning in Multimodal Large Language Models
Guanghao Zhou
Panjia Qiu
Cheng Chen
Jiadong Wang
Zheming Yang
Jian Xu
Minghui Qiu
OffRL
LRM
58
1
0
30 Apr 2025
Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design
Investigating Zero-Shot Diagnostic Pathology in Vision-Language Models with Efficient Prompt Design
Vasudev Sharma
Ahmed Alagha
Abdelhakim Khellaf
Vincent Quoc-Huy Trinh
Mahdi S. Hosseini
38
0
0
30 Apr 2025
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Common3D: Self-Supervised Learning of 3D Morphable Models for Common Objects in Neural Feature Space
Leonhard Sommer
Olaf Dünkel
Christian Theobalt
Adam Kortylewski
31
0
0
30 Apr 2025
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions
Can We Achieve Efficient Diffusion without Self-Attention? Distilling Self-Attention into Convolutions
Ziyi Dong
Chengxing Zhou
Weijian Deng
Pengxu Wei
Xiangyang Ji
Liang Lin
MQ
53
0
0
30 Apr 2025
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
SoccerDiffusion: Toward Learning End-to-End Humanoid Robot Soccer from Gameplay Recordings
Florian Vahl
Jörn Griepenburg
Jan Gutsche
Jasper Güldenstein
Jianwei Zhang
VGen
44
0
0
29 Apr 2025
X-Fusion: Introducing New Modality to Frozen Large Language Models
X-Fusion: Introducing New Modality to Frozen Large Language Models
Sicheng Mo
Thao Nguyen
Xun Huang
Siddharth Srinivasan Iyer
Yijun Li
...
Eli Shechtman
Krishna Kumar Singh
Yong Jae Lee
Bolei Zhou
Yuheng Li
77
0
0
29 Apr 2025
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
GarmentX: Autoregressive Parametric Representations for High-Fidelity 3D Garment Generation
Jingfeng Guo
Jianfei Chen
Weikai Chen
Zhenyu Sun
Lanjiong Li
Baozhu Zhao
Lingting Zhu
Xinyu Wang
Qi Liu
3DH
85
0
0
29 Apr 2025
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting
Creating Your Editable 3D Photorealistic Avatar with Tetrahedron-constrained Gaussian Splatting
Hanxi Liu
Yifang Men
Zhouhui Lian
3DGS
33
0
0
29 Apr 2025
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
In-Context Edit: Enabling Instructional Image Editing with In-Context Generation in Large Scale Diffusion Transformer
Zechuan Zhang
Ji Xie
Yu Lu
Zongxin Yang
Yi Yang
DiffM
97
1
0
29 Apr 2025
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
NORA: A Small Open-Sourced Generalist Vision Language Action Model for Embodied Tasks
Chia-Yu Hung
Qi Sun
Pengfei Hong
Amir Zadeh
Chuan Li
U-Xuan Tan
Navonil Majumder
Soujanya Poria
LM&Ro
42
1
0
28 Apr 2025
Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics
Do You Know the Way? Human-in-the-Loop Understanding for Fast Traversability Estimation in Mobile Robotics
Andre Schreiber
Katherine Rose Driggs-Campbell
153
0
0
28 Apr 2025
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Prisma: An Open Source Toolkit for Mechanistic Interpretability in Vision and Video
Sonia Joseph
Praneet Suresh
Lorenz Hufe
Edward Stevinson
Robert Graham
Yash Vadi
Danilo Bzdok
Sebastian Lapuschkin
Lee Sharkey
Blake A. Richards
72
0
0
28 Apr 2025
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping
PhenoAssistant: A Conversational Multi-Agent AI System for Automated Plant Phenotyping
Feng Chen
Ilias Stogiannidis
Andrew Wood
Danilo Bueno
Dominic Williams
...
Stephen A. Rolfe
Tracy Lawson
Tony Pridmore
M. Giuffrida
Sotirios A. Tsaftaris
62
0
0
28 Apr 2025
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation
Pixels2Points: Fusing 2D and 3D Features for Facial Skin Segmentation
Victoria Yue Chen
Daoye Wang
Stephan Garbin
Jan Bednarík
Sebastian Winberg
Timo Bolkart
Thabo Beeler
3DH
3DPC
42
0
0
28 Apr 2025
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
EarthMapper: Visual Autoregressive Models for Controllable Bidirectional Satellite-Map Translation
Zhe Dong
Yuzhe Sun
Tianzhu Liu
Wangmeng Zuo
Yanfeng Gu
57
0
0
28 Apr 2025
CompleteMe: Reference-based Human Image Completion
CompleteMe: Reference-based Human Image Completion
Yu-Ju Tsai
Brian L. Price
Qing Liu
Luis Figueroa
D. Pakhomov
Zhihong Ding
Scott D. Cohen
Ming Yang
3DH
52
0
0
28 Apr 2025
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
OpenFusion++: An Open-vocabulary Real-time Scene Understanding System
Xiaofeng Jin
Matteo Frosi
Matteo Matteucci
157
0
0
27 Apr 2025
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
CLR-Wire: Towards Continuous Latent Representations for 3D Curve Wireframe Generation
Xueqi Ma
Y. Liu
Tianlong Gao
Qingming Huang
Hui Huang
3DV
AI4CE
42
0
0
27 Apr 2025
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
CARL: Camera-Agnostic Representation Learning for Spectral Image Analysis
Alexander Baumann
Leonardo Ayala
Siyang Song
Jan Sellner
Alexander Studier-Fischer
Berkin Özdemir
Lena Maier-Hein
Slobodan Ilic
51
0
0
27 Apr 2025
Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
Multi-Stage Boundary-Aware Transformer Network for Action Segmentation in Untrimmed Surgical Videos
Rezowan Shuvo
M S Mekala
Eyad Elyan
MedIm
131
0
0
26 Apr 2025
What is the Added Value of UDA in the VFM Era?
What is the Added Value of UDA in the VFM Era?
B. B. Englert
Tommie Kerssies
Gijs Dubbelman
44
0
0
25 Apr 2025
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Eval3D: Interpretable and Fine-grained Evaluation for 3D Generation
Shivam Duggal
Yushi Hu
Oscar Michel
Aniruddha Kembhavi
William T. Freeman
Noah A. Smith
Ranjay Krishna
Antonio Torralba
Ali Farhadi
Wei-Chiu Ma
EGVM
ELM
77
0
0
25 Apr 2025
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
SSL4Eco: A Global Seasonal Dataset for Geospatial Foundation Models in Ecology
Elena Plekhanova
Damien Robert
Johannes Dollinger
Emilia Arens
Philipp Brun
Jan Dirk Wegner
Niklaus Zimmermann
24
0
0
25 Apr 2025
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
E-InMeMo: Enhanced Prompting for Visual In-Context Learning
Jiahao Zhang
Bowen Wang
Hong Liu
Liangzhi Li
Yuta Nakashima
Hajime Nagahara
VLM
104
0
0
25 Apr 2025
Improving Open-World Object Localization by Discovering Background
Improving Open-World Object Localization by Discovering Background
Ashish Singh
Michael Jeffrey Jones
Kuan-Chuan Peng
A. Cherian
Moitreya Chatterjee
Erik Learned-Miller
ObjD
OCL
VLM
64
0
0
24 Apr 2025
Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding
Predict-Optimize-Distill: A Self-Improving Cycle for 4D Object Understanding
Mingxuan Wu
Huang Huang
Justin Kerr
C. Kim
Anthony Zhang
Brent Yi
Angjoo Kanazawa
52
0
0
24 Apr 2025
Lessons from Deploying Learning-based CSI Localization on a Large-Scale ISAC Platform
Lessons from Deploying Learning-based CSI Localization on a Large-Scale ISAC Platform
Tianyu Zhang
Dongheng Zhang
Ruixu Geng
Xuecheng Xie
Shuai Yang
Yan Chen
41
0
0
24 Apr 2025
Dargana: fine-tuning EarthPT for dynamic tree canopy mapping from space
Dargana: fine-tuning EarthPT for dynamic tree canopy mapping from space
Michael J. Smith
Luke Fleming
James E. Geach
Ryan J. Roberts
Freddie Kalaitzis
James Banister
29
0
0
24 Apr 2025
The Fourth Monocular Depth Estimation Challenge
The Fourth Monocular Depth Estimation Challenge
Anton Obukhov
Matteo Poggi
Fabio Tosi
Ripudaman Singh Arora
Jaime Spencer
...
Tuan-Anh Yang
Minh-Quang Nguyen
T. Tran
Albert Luginov
Muhammad Shahzad
MDE
129
0
0
24 Apr 2025
V$^2$R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
V2^22R-Bench: Holistically Evaluating LVLM Robustness to Fundamental Visual Variations
Zhiyuan Fan
Yumeng Wang
Sandeep Polisetty
Yi Ren Fung
50
0
0
23 Apr 2025
Previous
12345...424344
Next