ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2101.01169
  4. Cited By
Transformers in Vision: A Survey
v1v2v3v4v5 (latest)

Transformers in Vision: A Survey

4 January 2021
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
Fahad Shahbaz Khan
M. Shah
    ViT
ArXiv (abs)PDFHTML

Papers citing "Transformers in Vision: A Survey"

50 / 263 papers shown
Title
Image Recognition with Online Lightweight Vision Transformer: A Survey
Image Recognition with Online Lightweight Vision Transformer: A Survey
Zherui Zhang
Rongtao Xu
Jie Zhou
Changwei Wang
Xingtian Pei
...
Jiguang Zhang
Li Guo
Longxiang Gao
Wenyuan Xu
Shibiao Xu
ViT
496
0
0
06 May 2025
Remote sensing colour image semantic segmentation of trails created by large herbivorous Mammals
Remote sensing colour image semantic segmentation of trails created by large herbivorous Mammals
J. Díez-Pastor
Francisco Javier Gonzalez-Moya
Pedro Latorre-Carmona
Francisco Javier Perez-Barbería
Ludmila I.Kuncheva
Antonio Canepa-Oneto
Alvar Arnaiz-González
C. García-Osorio
271
0
0
16 Apr 2025
Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
Embedding Radiomics into Vision Transformers for Multimodal Medical Image Classification
Zhenyu Yang
Haiming Zhu
Rihui Zhang
Haipeng Zhang
Jianliang Wang
Chunhao Wang
Minbin Chen
F. Yin
MedIm
121
0
0
15 Apr 2025
A super-resolution reconstruction method for lightweight building images based on an expanding feature modulation network
A super-resolution reconstruction method for lightweight building images based on an expanding feature modulation network
Yi Zhang
Wenye Zhou
Ruonan Lin
SupR
118
0
0
17 Mar 2025
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation
MaskAttn-UNet: A Mask Attention-Driven Framework for Universal Low-Resolution Image Segmentation
Anzhe Cheng
Chenzhong Yin
Yu Chang
Heng Ping
Shixuan Li
Shahin Nazarian
Paul Bogdan
SSeg
263
0
0
11 Mar 2025
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Looped ReLU MLPs May Be All You Need as Practical Programmable Computers
Yingyu Liang
Zhizhou Sha
Zhenmei Shi
Zhao Song
Yufa Zhou
165
19
0
21 Feb 2025
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
YOLO-MS: Rethinking Multi-Scale Representation Learning for Real-time Object Detection
Yuming Chen
Xinbin Yuan
Ruiqi Wu
Jiabao Wang
Qibin Hou
Mingg-Ming Cheng
Ming-Ming Cheng
ObjD
275
53
0
21 Feb 2025
Infrared Image Super-Resolution: Systematic Review, and Future Trends
Infrared Image Super-Resolution: Systematic Review, and Future Trends
Y. Huang
Tomo Miyazaki
Xiao-Fang Liu
S. Omachi
SupR
152
13
0
21 Feb 2025
Learning county from pixels: Corn yield prediction with attention-weighted multiple instance learning
Xiaoyu Wang
Yuchi Ma
Qunying Huang
Zhengwei Yang
Zhou Zhang
199
1
0
17 Feb 2025
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data
TLOB: A Novel Transformer Model with Dual Attention for Price Trend Prediction with Limit Order Book Data
Leonardo Berti
Gjergji Kasneci
AI4TS
187
1
0
12 Feb 2025
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks
MoENAS: Mixture-of-Expert based Neural Architecture Search for jointly Accurate, Fair, and Robust Edge Deep Neural Networks
Lotfi Abdelkrim Mecharbat
Alberto Marchisio
Mohamed Bennai
M. Ghassemi
Tuka Alhanai
181
0
0
11 Feb 2025
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Jialin Wu
Kaikai Pan
Yanjiao Chen
Jiangyi Deng
Shengyuan Pang
Wenyuan Xu
ViTAAML
107
0
0
13 Jan 2025
Generation from Noisy Examples
Generation from Noisy Examples
A. Raman
Vinod Raman
91
1
0
07 Jan 2025
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Semantics Prompting Data-Free Quantization for Low-Bit Vision Transformers
Mingliang Xu
Yuyao Zhou
Yuxin Zhang
Shen Li
Yong Li
Chia-Wen Lin
Zhanpeng Zeng
Rongrong Ji
MQ
316
0
0
31 Dec 2024
Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction
Vision Transformers for Efficient Indoor Pathloss Radio Map Prediction
Rafayel Mkrtchyan
Edvard Ghukasyan
Khoren Petrosyan
Hrant Khachatrian
Theofanis P. Raptis
135
0
0
12 Dec 2024
Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey
Passive Deepfake Detection Across Multi-modalities: A Comprehensive Survey
Hong-Hanh Nguyen-Le
Van-Tuan Tran
Dinh-Thuc Nguyen
Nhien-An Le-Khac
AAML
172
2
0
26 Nov 2024
HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments
HMT-Grasp: A Hybrid Mamba-Transformer Approach for Robot Grasping in Cluttered Environments
Songsong Xiong
Hamidreza Kasaei
99
1
0
04 Oct 2024
SoK: Leveraging Transformers for Malware Analysis
SoK: Leveraging Transformers for Malware Analysis
Pradip Kunwar
Kshitiz Aryal
Maanak Gupta
Mahmoud Abdelsalam
Elisa Bertino
143
0
0
27 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
307
54
0
23 May 2024
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Segmentation Guided Sparse Transformer for Under-Display Camera Image Restoration
Jingyun Xue
Tao Wang
Jun Wang
Kaihao Zhang
ViT
85
2
0
09 Mar 2024
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
Masked LoGoNet: Fast and Accurate 3D Image Analysis for Medical Domain
Amin Karimi Monsefi
Payam Karisani
Mengxi Zhou
Stacey S. Choi
Nathan Doble
Heng Ji
Srinivasan Parthasarathy
R. Ramnath
85
5
0
09 Feb 2024
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Parameter-Efficient Fine-Tuning for Pre-Trained Vision Models: A Survey
Yi Xin
Jianjiang Yang
Haodi Zhou
Junlong Du
Junlong Du
Yue Fan
Qing Li
Qing Li
Yuntao Du
VLM
143
85
0
03 Feb 2024
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
OnDev-LCT: On-Device Lightweight Convolutional Transformers towards federated learning
Chu Myaet Thwal
Minh N. H. Nguyen
Ye Lin Tun
Seongjin Kim
My T. Thai
Choong Seon Hong
112
6
0
22 Jan 2024
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
TouchUp-G: Improving Feature Representation through Graph-Centric Finetuning
Jing Zhu
Xiang Song
V. Ioannidis
Danai Koutra
Christos Faloutsos
151
15
0
25 Sep 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
135
4
0
18 Aug 2023
Local-Aware Global Attention Network for Person Re-Identification Based on Body and Hand Images
Local-Aware Global Attention Network for Person Re-Identification Based on Body and Hand Images
N. L. Baisa
CVBM
86
4
0
11 Sep 2022
Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain
Unsupervised Domain Adaptation via Style-Aware Self-intermediate Domain
Lianyu Wang
Meng Wang
Daoqiang Zhang
Huazhu Fu
91
2
0
05 Sep 2022
Generative Adversarial Networks
Generative Adversarial Networks
Gilad Cohen
Raja Giryes
GAN
298
30,152
0
01 Mar 2022
Patches Are All You Need?
Patches Are All You Need?
Asher Trockman
J. Zico Kolter
ViT
268
411
0
24 Jan 2022
Restormer: Efficient Transformer for High-Resolution Image Restoration
Restormer: Efficient Transformer for High-Resolution Image Restoration
Syed Waqas Zamir
Aditya Arora
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
182
2,249
0
18 Nov 2021
Pix2seq: A Language Modeling Framework for Object Detection
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLMViTVLM
279
350
0
22 Sep 2021
SwinIR: Image Restoration Using Swin Transformer
SwinIR: Image Restoration Using Swin Transformer
Christos Sakaridis
Jie Cao
Guolei Sun
Peng Sun
Luc Van Gool
Radu Timofte
ViT
196
2,956
0
23 Aug 2021
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Perceiver IO: A General Architecture for Structured Inputs & Outputs
Andrew Jaegle
Sebastian Borgeaud
Jean-Baptiste Alayrac
Carl Doersch
Catalin Ionescu
...
Olivier J. Hénaff
M. Botvinick
Andrew Zisserman
Oriol Vinyals
João Carreira
MLLMVLMGNN
93
584
0
30 Jul 2021
GLiT: Neural Architecture Search for Global and Local Image Transformer
GLiT: Neural Architecture Search for Global and Local Image Transformer
Boyu Chen
Peixia Li
Chuming Li
Baopu Li
Lei Bai
Chen Lin
Ming Sun
Junjie Yan
Wanli Ouyang
ViT
103
86
0
07 Jul 2021
CSWin Transformer: A General Vision Transformer Backbone with
  Cross-Shaped Windows
CSWin Transformer: A General Vision Transformer Backbone with Cross-Shaped Windows
Xiaoyi Dong
Jianmin Bao
Dongdong Chen
Weiming Zhang
Nenghai Yu
Lu Yuan
Dong Chen
B. Guo
ViT
154
986
0
01 Jul 2021
AutoFormer: Searching Transformers for Visual Recognition
AutoFormer: Searching Transformers for Visual Recognition
Minghao Chen
Houwen Peng
Jianlong Fu
Haibin Ling
ViT
97
267
0
01 Jul 2021
Focal Self-attention for Local-Global Interactions in Vision
  Transformers
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
80
436
0
01 Jul 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViTAI4TS
122
1,682
0
25 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
111
231
0
22 Jun 2021
Efficient Self-supervised Vision Transformers for Representation
  Learning
Efficient Self-supervised Vision Transformers for Representation Learning
Chunyuan Li
Jianwei Yang
Pengchuan Zhang
Mei Gao
Bin Xiao
Xiyang Dai
Lu Yuan
Jianfeng Gao
ViT
106
213
0
17 Jun 2021
XCiT: Cross-Covariance Image Transformers
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
151
513
0
17 Jun 2021
Long-Short Temporal Contrastive Learning of Video Transformers
Long-Short Temporal Contrastive Learning of Video Transformers
Jue Wang
Gedas Bertasius
Du Tran
Lorenzo Torresani
VLMViT
125
50
0
17 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
141
1,212
0
09 Jun 2021
Scaling Vision Transformers
Scaling Vision Transformers
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
144
1,098
0
08 Jun 2021
On Improving Adversarial Transferability of Vision Transformers
On Improving Adversarial Transferability of Vision Transformers
Muzammal Naseer
Kanchana Ranasinghe
Salman Khan
Fahad Shahbaz Khan
Fatih Porikli
ViT
91
95
0
08 Jun 2021
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Shuffle Transformer: Rethinking Spatial Shuffle for Vision Transformer
Zilong Huang
Youcheng Ben
Guozhong Luo
Pei Cheng
Gang Yu
Bin-Bin Fu
ViT
87
183
0
07 Jun 2021
Uformer: A General U-Shaped Transformer for Image Restoration
Uformer: A General U-Shaped Transformer for Image Restoration
Zhendong Wang
Xiaodong Cun
Jianmin Bao
Wengang Zhou
Jianzhuang Liu
Houqiang Li
ViT
119
1,419
0
06 Jun 2021
Referring Transformer: A One-step Approach to Multi-task Visual
  Grounding
Referring Transformer: A One-step Approach to Multi-task Visual Grounding
Muchen Li
Leonid Sigal
ObjD
105
193
0
06 Jun 2021
RegionViT: Regional-to-Local Attention for Vision Transformers
RegionViT: Regional-to-Local Attention for Vision Transformers
Chun-Fu Chen
Yikang Shen
Quanfu Fan
ViT
118
197
0
04 Jun 2021
When Vision Transformers Outperform ResNets without Pre-training or
  Strong Data Augmentations
When Vision Transformers Outperform ResNets without Pre-training or Strong Data Augmentations
Xiangning Chen
Cho-Jui Hsieh
Boqing Gong
ViT
93
329
0
03 Jun 2021
123456
Next