ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2104.14294
  4. Cited By
Emerging Properties in Self-Supervised Vision Transformers
v1v2 (latest)

Emerging Properties in Self-Supervised Vision Transformers

29 April 2021
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
ArXiv (abs)PDFHTML

Papers citing "Emerging Properties in Self-Supervised Vision Transformers"

50 / 4,175 papers shown
Title
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Contextual Similarity Distillation: Ensemble Uncertainties with a Single Model
Moritz A. Zanger
Pascal R. van der Vaart
Wendelin Bohmer
M. Spaan
UQCVBDL
507
2
0
14 Mar 2025
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Self-Supervised Pretraining for Fine-Grained Plankton Recognition
Joona Kareinen
T. Eerola
K. Kraft
L. Lensu
S. Suikkanen
Heikki Kälviäinen
SSL
494
0
0
14 Mar 2025
LUSD: Localized Update Score Distillation for Text-Guided Image Editing
LUSD: Localized Update Score Distillation for Text-Guided Image Editing
Worameth Chinchuthakun
Tossaporn Saengja
Nontawat Tritrong
Pitchaporn Rewatbowornwong
Pramook Khungurn
Supasorn Suwajanakorn
DiffM
104
0
0
14 Mar 2025
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
COIN: Confidence Score-Guided Distillation for Annotation-Free Cell Segmentation
Sanghyun Jo
Seo Jin Lee
Seungwoo Lee
Seohyung Hong
Hyungseok Seo
Kyungsu Kim
80
0
0
14 Mar 2025
EgoSplat: Open-Vocabulary Egocentric Scene Understanding with Language Embedded 3D Gaussian Splatting
Di Li
Jie Feng
Jiahao Chen
Weisheng Dong
Guanbin Li
G. Shi
Licheng Jiao
3DGSVLM
433
0
0
14 Mar 2025
Watch and Learn: Leveraging Expert Knowledge and Language for Surgical Video Understanding
David Gastager
Ghazal Ghazaei
Constantin Patsch
88
0
0
14 Mar 2025
Proxy-Tuning: Tailoring Multimodal Autoregressive Models for Subject-Driven Image Generation
Yi Wu
Lingting Zhu
Lei Liu
Wandi Qiao
Ziqiang Li
Lequan Yu
Bin Li
DiffM
102
1
0
13 Mar 2025
The Power of One: A Single Example is All it Takes for Segmentation in VLMs
Mir Rayat Imtiaz Hossain
Mennatullah Siam
Leonid Sigal
James J. Little
MLLMVLM
Presented at ResearchTrend Connect | VLM on 21 May 2025
230
0
0
13 Mar 2025
ConsisLoRA: Enhancing Content and Style Consistency for LoRA-based Style Transfer
Bolin Chen
Baoquan Zhao
H. Xie
Yi Cai
Qing Li
Xudong Mao
DiffM
101
2
0
13 Mar 2025
RoMA: Scaling up Mamba-based Foundation Models for Remote Sensing
Fengxiang Wang
Hongru Wang
Yansen Wang
Di Wang
Mingshuo Chen
...
Yangang Sun
Shuo Wang
L. Lan
Wenjing Yang
Jing Zhang
Mamba
122
3
0
13 Mar 2025
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
UVE: Are MLLMs Unified Evaluators for AI-Generated Videos?
Yuanxin Liu
Rui Zhu
Shuhuai Ren
Jiacong Wang
Haoyuan Guo
Xu Sun
Lu Jiang
377
1
0
13 Mar 2025
VideoMerge: Towards Training-free Long Video Generation
Siyang Zhang
Harry Yang
Ser-Nam Lim
DiffMVGen
96
1
0
13 Mar 2025
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
4D LangSplat: 4D Language Gaussian Splatting via Multimodal Large Language Models
Wanhua Li
Renping Zhou
Jiawei Zhou
Yingwei Song
Johannes Herter
Minghan Qin
Gao Huang
Hanspeter Pfister
3DGSVLM
145
3
0
13 Mar 2025
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Do computer vision foundation models learn the low-level characteristics of the human visual system?
Yancheng Cai
Fei Yin
Dounia Hammou
Rafal Mantiuk
VLM
Presented at ResearchTrend Connect | VLM on 14 Mar 2025
225
2
0
13 Mar 2025
Panopticon: Advancing Any-Sensor Foundation Models for Earth Observation
Leonard Waldmann
Ando Shah
Yi Wang
Nils Lehmann
Adam J. Stewart
Zhitong Xiong
Xiao Xiang Zhu
Stefan Bauer
John Chuang
74
4
0
13 Mar 2025
Lightweight Models for Emotional Analysis in Video
Lightweight Models for Emotional Analysis in Video
Quoc-Tien Nguyen
H. Nguyen
V. Huynh
133
0
0
13 Mar 2025
DreamInsert: Zero-Shot Image-to-Video Object Insertion from A Single Image
Qi Zhao
Zhan Ma
Pan Zhou
VGen
144
0
0
13 Mar 2025
Transformers without Normalization
Transformers without Normalization
Jiachen Zhu
Xinlei Chen
Kaiming He
Yann LeCun
Zhuang Liu
OffRLViT
160
20
0
13 Mar 2025
Leveraging Vision-Language Embeddings for Zero-Shot Learning in Histopathology Images
M. Rahaman
Ewan K. A. Millar
Erik H. W. Meijering
VLM
115
0
0
13 Mar 2025
Interpretable Image Classification via Non-parametric Part Prototype Learning
Zhijie Zhu
Lei Fan
Maurice Pagnucco
Yang Song
123
0
0
13 Mar 2025
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
Reangle-A-Video: 4D Video Generation as Video-to-Video Translation
Hyeonho Jeong
Suhyeon Lee
Jong Chul Ye
VGen
490
2
0
12 Mar 2025
CleverDistiller: Simple and Spatially Consistent Cross-modal Distillation
Hariprasath Govindarajan
Maciej K. Wozniak
Marvin Klingner
Camille Maurice
B. R. Kiran
S. Yogamani
120
0
0
12 Mar 2025
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Object-Aware DINO (Oh-A-Dino): Enhancing Self-Supervised Representations for Multi-Object Instance Retrieval
Stefan Sylvius Wagner
Stefan Harmeling
OCL
137
1
0
12 Mar 2025
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Long-horizon Visual Instruction Generation with Logic and Attribute Self-reflection
Yucheng Suo
Fan Ma
Kaixin Shen
Linchao Zhu
Yi Yang
VLM
86
0
0
12 Mar 2025
Isolated Channel Vision Transformers: From Single-Channel Pretraining to Multi-Channel Finetuning
Wenyi Lian
Joakim Lindblad
Patrick Micke
Natasa Sladoje
102
1
0
12 Mar 2025
Measure Twice, Cut Once: Grasping Video Structures and Event Semantics with LLMs for Video Temporal Localization
Zongshang Pang
Mayu Otani
Yuta Nakashima
128
0
0
12 Mar 2025
Evaluating Visual Explanations of Attention Maps for Transformer-based Medical Imaging
Minjae Chung
Jong Bum Won
Ganghyun Kim
Yujin Kim
Utku Ozbulak
MedIm
196
0
0
12 Mar 2025
UniCombine: Unified Multi-Conditional Combination with Diffusion Transformer
Haoxuan Wang
Jinlong Peng
Qu He
Hao Yang
Ying Jin
...
Yanjie Pan
Zhenye Gan
M. Chi
Bo Peng
Yun Wang
DiffM
103
2
0
12 Mar 2025
Discovering Influential Neuron Path in Vision Transformers
Discovering Influential Neuron Path in Vision Transformers
Yifan Wang
Yifei Liu
Yingdong Shi
Chong Li
Anqi Pang
Sibei Yang
Jingyi Yu
Kan Ren
ViT
249
0
0
12 Mar 2025
Implicit Contrastive Representation Learning with Guided Stop-gradient
Byeongchan Lee
Sehyun Lee
SSL
266
2
0
12 Mar 2025
Freeze and Cluster: A Simple Baseline for Rehearsal-Free Continual Category Discovery
Chuyu Zhang
Xueyang Yu
Peiyan Gu
Xuming He
CLL
144
0
0
12 Mar 2025
ObjectMover: Generative Object Movement with Video Prior
Xin Yu
Tianyu Wang
Seunggeun Kim
Paul Guerrero
Xi Chen
Qing Liu
Zhe Lin
Xiaojuan Qi
DiffMVGenOCL
139
2
0
11 Mar 2025
FPGS: Feed-Forward Semantic-aware Photorealistic Style Transfer of Large-Scale Gaussian Splatting
GeonU Kim
Kim Youwang
Lee Hyoseok
Tae-Hyun Oh
3DGS
105
0
0
11 Mar 2025
Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models
Scale-Aware Pre-Training for Human-Centric Visual Perception: Enabling Lightweight and Generalizable Models
Xuanhan Wang
Huimin Deng
Lianli Gao
Jingkuan Song
VLM
74
0
0
11 Mar 2025
Preserving Product Fidelity in Large Scale Image Recontextualization with Diffusion Models
Ishaan Malhi
Praneet Dutta
Ellie Talius
Sally Ma
Brendan Driscoll
Krista Holden
G. Pruthi
Arunachalam Narayanaswamy
DiffM
91
0
0
11 Mar 2025
DiffEGG: Diffusion-Driven Edge Generation as a Pixel-Annotation-Free Alternative for Instance Annotation
Sanghyun Jo
Ziseok Lee
Wooyeol Lee
Kyungsu Kim
131
2
0
11 Mar 2025
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
WildSeg3D: Segment Any 3D Objects in the Wild from 2D Images
Yansong Guo
Jie Hu
Yansong Qu
Liujuan Cao
3DGS
477
0
0
11 Mar 2025
DIV-FF: Dynamic Image-Video Feature Fields For Environment Understanding in Egocentric Videos
Lorenzo Mur-Labadia
Josechu Guerrero
Ruben Martinez-Cantin
VGen
109
0
0
11 Mar 2025
COMODO: Cross-Modal Video-to-IMU Distillation for Efficient Egocentric Human Activity Recognition
Baiyu Chen
Wilson Wongso
Zechen Li
Yonchanok Khaokaew
Hao Xue
Flora D. Salim
168
1
0
10 Mar 2025
ADROIT: A Self-Supervised Framework for Learning Robust Representations for Active Learning
S. Banerjee
Vinay Kumar Verma
SSL
103
0
0
10 Mar 2025
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
VidBot: Learning Generalizable 3D Actions from In-the-Wild 2D Human Videos for Zero-Shot Robotic Manipulation
Hanzhi Chen
Boyang Sun
Anran Zhang
Marc Pollefeys
Stefan Leutenegger
LM&Ro
161
0
0
10 Mar 2025
FunGraph: Functionality Aware 3D Scene Graphs for Language-Prompted Scene Interaction
Dennis Rotondi
Fabio Scaparro
Hermann Blum
Kai O. Arras
95
0
0
10 Mar 2025
Large model enhanced computational ghost imaging
Yifan Chen
Hongjun An
Zhe Sun
Tong Tian
Mingliang Chen
Christian Spielmann
Xuelong Li
63
0
0
10 Mar 2025
MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction
MIRAM: Masked Image Reconstruction Across Multiple Scales for Breast Lesion Risk Prediction
H. Q. Vo
Pengyu Yuan
Zheng Yin
Kelvin K. Wong
Chika F. Ezeana
S. Ly
Stephen T. C. Wong
H. Nguyen
57
0
0
10 Mar 2025
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
A Data-Centric Revisit of Pre-Trained Vision Models for Robot Learning
Xin Wen
Bingchen Zhao
Yilun Chen
Jiangmiao Pang
Xiaojuan Qi
LM&Ro
222
0
0
10 Mar 2025
Self-supervised Normality Learning and Divergence Vector-guided Model Merging for Zero-shot Congenital Heart Disease Detection in Fetal Ultrasound Videos
Pramit Saha
Divyanshu Mishra
Netzahualcoyotl Hernandez-Cruz
Olga Patey
A. Papageorghiou
Yuki M. Asano
J. A. Noble
107
2
0
10 Mar 2025
LatexBlend: Scaling Multi-concept Customized Generation with Latent Textual Blending
Jian Jin
Zhenbo Yu
Yang Shen
Zhenyong Fu
Jian Yang
DiffM
110
1
0
10 Mar 2025
TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces
TwinTURBO: Semi-Supervised Fine-Tuning of Foundation Models via Mutual Information Decompositions for Downstream Task and Latent Spaces
Guillaume Quétant
Pavlo Molchanov
Svyatoslav Voloshynovskiy
118
0
0
10 Mar 2025
Alligat0R: Pre-Training Through Co-Visibility Segmentation for Relative Camera Pose Regression
Thibaut Loiseau
Guillaume Bourmaud
Vincent Lepetit
132
1
0
10 Mar 2025
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
SemHiTok: A Unified Image Tokenizer via Semantic-Guided Hierarchical Codebook for Multimodal Understanding and Generation
Zhenpeng Chen
Chunwei Wang
Xiuwei Chen
Hongbin Xu
Jiawei Han
Xiandan Liang
J. N. Han
Hang Xu
Xiaodan Liang
VLM
179
2
0
09 Mar 2025
Previous
123...91011...828384
Next