ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.18051
  4. Cited By

LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision

23 May 2025
A. Fuller
Yousef Yassin
Junfeng Wen
Daniel G. Kyrollos
Tarek Ibrahim
James R. Green
Evan Shelhamer
    ViT
ArXivPDFHTML

Papers citing "LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision"

47 / 47 papers shown
Title
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Attention Distillation: A Unified Approach to Visual Characteristics Transfer
Yang Zhou
Xu Gao
Zichong Chen
Hui Huang
DiffM
87
7
0
27 Feb 2025
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision
  Transformers
Learning to Merge Tokens via Decoupled Embedding for Efficient Vision Transformers
Dong Hoon Lee
Seunghoon Hong
77
3
0
13 Dec 2024
Token Cropr: Faster ViTs for Quite a Few Tasks
Token Cropr: Faster ViTs for Quite a Few Tasks
Benjamin Bergner
C. Lippert
Aravindh Mahendran
ViT
VLM
87
1
0
01 Dec 2024
On the Surprising Effectiveness of Attention Transfer for Vision
  Transformers
On the Surprising Effectiveness of Attention Transfer for Vision Transformers
Alexander C. Li
Yuandong Tian
Bin Chen
Deepak Pathak
Xinlei Chen
53
3
0
14 Nov 2024
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for
  Large-scale Map Construction
OpenSatMap: A Fine-grained High-resolution Satellite Dataset for Large-scale Map Construction
Hongbo Zhao
Lue Fan
Yuntao Chen
Haochen Wang
Yiran Yang
Xiaojuan Jin
Yixin Zhang
Gaofeng Meng
Zhaoxiang Zhang
83
2
0
30 Oct 2024
Agglomerative Token Clustering
Agglomerative Token Clustering
Joakim Bruslund Haurum
Sergio Escalera
Graham W. Taylor
T. Moeslund
64
2
0
18 Sep 2024
Towards Latent Masked Image Modeling for Self-Supervised Visual
  Representation Learning
Towards Latent Masked Image Modeling for Self-Supervised Visual Representation Learning
Yibing Wei
Abhinav Gupta
Pedro Morgado
SSL
55
9
0
22 Jul 2024
Accelerating Transformers with Spectrum-Preserving Token Merging
Accelerating Transformers with Spectrum-Preserving Token Merging
Hoai-Chau Tran
D. M. Nguyen
Duy M. Nguyen
Trung Thanh Nguyen
Ngan Le
Pengtao Xie
Daniel Sonntag
James Y. Zou
Binh T. Nguyen
Mathias Niepert
72
11
0
25 May 2024
LookHere: Vision Transformers with Directed Attention Generalize and
  Extrapolate
LookHere: Vision Transformers with Directed Attention Generalize and Extrapolate
A. Fuller
Daniel G. Kyrollos
Yousef Yassin
James R. Green
79
3
0
22 May 2024
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Learning to Rank Patches for Unbiased Image Redundancy Reduction
Yang Luo
Zhineng Chen
Peng Zhou
Zuxuan Wu
Xieping Gao
Yu-Gang Jiang
SSL
53
1
0
31 Mar 2024
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient
  Vision Transformers
Multi-criteria Token Fusion with One-step-ahead Attention for Efficient Vision Transformers
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
ViT
60
9
0
15 Mar 2024
xT: Nested Tokenization for Larger Context in Large Images
xT: Nested Tokenization for Larger Context in Large Images
Ritwik Gupta
Shufan Li
Tyler Lixuan Zhu
Jitendra Malik
Trevor Darrell
K. Mangalam
ViT
67
5
0
04 Mar 2024
Vision Mamba: Efficient Visual Representation Learning with
  Bidirectional State Space Model
Vision Mamba: Efficient Visual Representation Learning with Bidirectional State Space Model
Lianghui Zhu
Bencheng Liao
Qian Zhang
Xinlong Wang
Wenyu Liu
Xinggang Wang
Mamba
68
753
0
17 Jan 2024
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
V*: Guided Visual Search as a Core Mechanism in Multimodal LLMs
Penghao Wu
Saining Xie
LRM
71
143
0
21 Dec 2023
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Token Fusion: Bridging the Gap between Token Pruning and Token Merging
Minchul Kim
Shangqian Gao
Yen-Chang Hsu
Yilin Shen
Hongxia Jin
52
37
0
02 Dec 2023
CROMA: Remote Sensing Representations with Contrastive Radar-Optical
  Masked Autoencoders
CROMA: Remote Sensing Representations with Contrastive Radar-Optical Masked Autoencoders
A. Fuller
K. Millard
James R. Green
41
68
0
01 Nov 2023
Vision Transformers Need Registers
Vision Transformers Need Registers
Zilong Chen
Maxime Oquab
Julien Mairal
Huaping Liu
ViT
85
326
0
28 Sep 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
249
3,205
0
14 Apr 2023
Scaling Vision Transformers to 22 Billion Parameters
Scaling Vision Transformers to 22 Billion Parameters
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
...
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
128
585
0
10 Feb 2023
Self-Supervised Learning from Images with a Joint-Embedding Predictive
  Architecture
Self-Supervised Learning from Images with a Joint-Embedding Predictive Architecture
Mahmoud Assran
Quentin Duval
Ishan Misra
Piotr Bojanowski
Pascal Vincent
Michael G. Rabbat
Yann LeCun
Nicolas Ballas
SSL
AI4TS
MDE
59
338
0
19 Jan 2023
Iterative Patch Selection for High-Resolution Image Recognition
Iterative Patch Selection for High-Resolution Image Recognition
Benjamin Bergner
C. Lippert
Aravindh Mahendran
47
13
0
24 Oct 2022
Token Merging: Your ViT But Faster
Token Merging: Your ViT But Faster
Daniel Bolya
Cheng-Yang Fu
Xiaoliang Dai
Peizhao Zhang
Christoph Feichtenhofer
Judy Hoffman
MoMe
64
446
0
17 Oct 2022
Attention Distillation: self-supervised vision transformer students need
  more guidance
Attention Distillation: self-supervised vision transformer students need more guidance
Kai Wang
Fei Yang
Joost van de Weijer
ViT
39
18
0
03 Oct 2022
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous
  Driving
Effective Adaptation in Multi-Task Co-Training for Unified Autonomous Driving
Xiwen Liang
Yangxin Wu
Jianhua Han
Hang Xu
Chunjing Xu
Xiaodan Liang
62
35
0
19 Sep 2022
DeiT III: Revenge of the ViT
DeiT III: Revenge of the ViT
Hugo Touvron
Matthieu Cord
Hervé Jégou
ViT
101
402
0
14 Apr 2022
AdaViT: Adaptive Tokens for Efficient Vision Transformer
AdaViT: Adaptive Tokens for Efficient Vision Transformer
Hongxu Yin
Arash Vahdat
J. Álvarez
Arun Mallya
Jan Kautz
Pavlo Molchanov
ViT
58
327
0
14 Dec 2021
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
AdaViT: Adaptive Vision Transformers for Efficient Image Recognition
Lingchen Meng
Hengduo Li
Bor-Chun Chen
Shiyi Lan
Zuxuan Wu
Yu-Gang Jiang
Ser-Nam Lim
ViT
47
227
0
30 Nov 2021
SimMIM: A Simple Framework for Masked Image Modeling
SimMIM: A Simple Framework for Masked Image Modeling
Zhenda Xie
Zheng Zhang
Yue Cao
Yutong Lin
Jianmin Bao
Zhuliang Yao
Qi Dai
Han Hu
154
1,331
0
18 Nov 2021
Masked Autoencoders Are Scalable Vision Learners
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
373
7,600
0
11 Nov 2021
BEiT: BERT Pre-Training of Image Transformers
BEiT: BERT Pre-Training of Image Transformers
Hangbo Bao
Li Dong
Songhao Piao
Furu Wei
ViT
175
2,790
0
15 Jun 2021
DynamicViT: Efficient Vision Transformers with Dynamic Token
  Sparsification
DynamicViT: Efficient Vision Transformers with Dynamic Token Sparsification
Yongming Rao
Wenliang Zhao
Benlin Liu
Jiwen Lu
Jie Zhou
Cho-Jui Hsieh
ViT
63
685
0
03 Jun 2021
Differentiable Patch Selection for Image Recognition
Differentiable Patch Selection for Image Recognition
Jean-Baptiste Cordonnier
Aravindh Mahendran
Alexey Dosovitskiy
Dirk Weissenborn
Jakob Uszkoreit
Thomas Unterthiner
50
95
0
07 Apr 2021
Exploring Simple Siamese Representation Learning
Exploring Simple Siamese Representation Learning
Xinlei Chen
Kaiming He
SSL
189
3,992
0
20 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at
  Scale
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
397
40,217
0
22 Oct 2020
3D Self-Supervised Methods for Medical Imaging
3D Self-Supervised Methods for Medical Imaging
Aiham Taleb
W. Loetzsch
Noel Danz
Julius Severin
Thomas Gaertner
Benjamin Bergner
C. Lippert
SSL
67
214
0
06 Jun 2020
Learning When and Where to Zoom with Deep Reinforcement Learning
Learning When and Where to Zoom with Deep Reinforcement Learning
Burak Uzkent
Stefano Ermon
54
68
0
01 Mar 2020
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression
  of Pre-Trained Transformers
MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers
Wenhui Wang
Furu Wei
Li Dong
Hangbo Bao
Nan Yang
Ming Zhou
VLM
91
1,230
0
25 Feb 2020
Momentum Contrast for Unsupervised Visual Representation Learning
Momentum Contrast for Unsupervised Visual Representation Learning
Kaiming He
Haoqi Fan
Yuxin Wu
Saining Xie
Ross B. Girshick
SSL
113
12,007
0
13 Nov 2019
RandAugment: Practical automated data augmentation with a reduced search
  space
RandAugment: Practical automated data augmentation with a reduced search space
E. D. Cubuk
Barret Zoph
Jonathon Shlens
Quoc V. Le
MQ
188
3,458
0
30 Sep 2019
Deep High-Resolution Representation Learning for Visual Recognition
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
302
3,572
0
20 Aug 2019
Processing Megapixel Images with Deep Attention-Sampling Models
Processing Megapixel Images with Deep Attention-Sampling Models
Angelos Katharopoulos
Franccois Fleuret
46
65
0
03 May 2019
Reparameterizable Subset Sampling via Continuous Relaxations
Reparameterizable Subset Sampling via Continuous Relaxations
Sang Michael Xie
Stefano Ermon
BDL
39
97
0
29 Jan 2019
Learning to Navigate for Fine-grained Classification
Learning to Navigate for Fine-grained Classification
Ze Yang
Tiange Luo
Dong Wang
Zhiqiang Hu
Jun Gao
Liwei Wang
39
446
0
02 Sep 2018
Categorical Reparameterization with Gumbel-Softmax
Categorical Reparameterization with Gumbel-Softmax
Eric Jang
S. Gu
Ben Poole
BDL
221
5,323
0
03 Nov 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
329
1,850
0
18 Aug 2016
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
238
19,523
0
09 Mar 2015
ImageNet Large Scale Visual Recognition Challenge
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
1.1K
39,383
0
01 Sep 2014
1