Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2111.06091
Cited By
A Survey of Visual Transformers
11 November 2021
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"A Survey of Visual Transformers"
39 / 39 papers shown
Title
SCFormer: Structured Channel-wise Transformer with Cumulative Historical State for Multivariate Time Series Forecasting
Shiwei Guo
Z. Chen
Yupeng Ma
Yunfei Han
Yi Wang
AI4TS
145
0
0
05 May 2025
Back to Fundamentals: Low-Level Visual Features Guided Progressive Token Pruning
Yuanbing Ouyang
Yizhuo Liang
Qingpeng Li
Xinfei Guo
Yiming Luo
Di Wu
Hao Wang
Yushan Pan
ViT
VLM
73
0
0
25 Apr 2025
GBT-SAM: Adapting a Foundational Deep Learning Model for Generalizable Brain Tumor Segmentation via Efficient Integration of Multi-Parametric MRI Data
Cecilia Diana-Albelda
Roberto Alcover-Couso
Álvaro García-Martín
Jesús Bescós
Marcos Escudero-Viñolo
42
1
0
06 Mar 2025
Learning Motion and Temporal Cues for Unsupervised Video Object Segmentation
Yunzhi Zhuge
Hongyu Gu
Lu Zhang
Jinqing Qi
Huchuan Lu
VOS
69
2
0
14 Jan 2025
Empirical Capacity Model for Self-Attention Neural Networks
Aki Härmä
M. Pietrasik
Anna Wilbik
36
1
0
22 Jul 2024
FViT: A Focal Vision Transformer with Gabor Filter
Yulong Shi
Mingwei Sun
Yongshuai Wang
Rui Wang
55
4
0
17 Feb 2024
SCHEME: Scalable Channel Mixer for Vision Transformers
Deepak Sridhar
Yunsheng Li
Nuno Vasconcelos
44
0
0
01 Dec 2023
EViT: An Eagle Vision Transformer with Bi-Fovea Self-Attention
Yulong Shi
Mingwei Sun
Yongshuai Wang
Hui Sun
Zengqiang Chen
34
4
0
10 Oct 2023
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation
Michael Jungo
Beat Wolf
Andrii Maksai
C. Musat
Andreas Fischer
27
2
0
06 Sep 2023
Which Transformer to Favor: A Comparative Analysis of Efficiency in Vision Transformers
Tobias Christian Nauen
Sebastián M. Palacio
Federico Raue
Andreas Dengel
42
3
0
18 Aug 2023
Energy-Based Models for Cross-Modal Localization using Convolutional Transformers
Alan Wu
Michael S. Ryoo
33
3
0
06 Jun 2023
Transformers in Speech Processing: A Survey
S. Latif
Aun Zaidi
Heriberto Cuayáhuitl
Fahad Shamshad
Moazzam Shoukat
Junaid Qadir
42
47
0
21 Mar 2023
Transformadores: Fundamentos teoricos y Aplicaciones
J. D. L. Torre
75
0
0
18 Feb 2023
Training a Vision Transformer from scratch in less than 24 hours with 1 GPU
Saghar Irandoust
Thibaut Durand
Yunduz Rakhmangulova
Wenjie Zi
Hossein Hajimirsadeghi
ViT
33
6
0
09 Nov 2022
SAP-DETR: Bridging the Gap Between Salient Points and Queries-Based Transformer Detector for Fast Model Convergency
Yang Liu
Yao Zhang
Yixin Wang
Yang Zhang
Jiang Tian
Zhongchao Shi
Jianping Fan
Zhiqiang He
42
14
0
03 Nov 2022
Fluid Batching: Exit-Aware Preemptive Serving of Early-Exit Neural Networks on Edge NPUs
Alexandros Kouris
Stylianos I. Venieris
Stefanos Laskaridis
Nicholas D. Lane
42
8
0
27 Sep 2022
Time-distance vision transformers in lung cancer diagnosis from longitudinal computed tomography
Thomas Z. Li
Kaiwen Xu
Riqiang Gao
Yucheng Tang
Thomas A. Lasko
Fabien Maldonado
K. Sandler
Bennett A. Landman
ViT
MedIm
22
11
0
04 Sep 2022
SB-SSL: Slice-Based Self-Supervised Transformers for Knee Abnormality Classification from MRI
Sara Atito
Syed Muhammad Anwar
Muhammad Awais
Josef Kitler
ViT
MedIm
29
12
0
29 Aug 2022
3D Vision with Transformers: A Survey
Jean Lahoud
Jiale Cao
F. Khan
Hisham Cholakkal
Rao Muhammad Anwer
Salman Khan
Ming Yang
ViT
MedIm
29
32
0
08 Aug 2022
Multimodal Learning with Transformers: A Survey
P. Xu
Xiatian Zhu
David A. Clifton
ViT
54
527
0
13 Jun 2022
Fine-tuning Image Transformers using Learnable Memory
Mark Sandler
A. Zhmoginov
Max Vladymyrov
Andrew Jackson
ViT
23
47
0
29 Mar 2022
Graph Attention Transformer Network for Multi-Label Image Classification
Jin Yuan
Shikai Chen
Yao Zhang
Zhongchao Shi
Xin Geng
Jianping Fan
Yong Rui
ViT
28
30
0
08 Mar 2022
DAB-DETR: Dynamic Anchor Boxes are Better Queries for DETR
Shilong Liu
Feng Li
Hao Zhang
X. Yang
Xianbiao Qi
Hang Su
Jun Zhu
Lei Zhang
ViT
155
728
0
28 Jan 2022
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Augmenting Convolutional networks with attention-based aggregation
Hugo Touvron
Matthieu Cord
Alaaeldin El-Nouby
Piotr Bojanowski
Armand Joulin
Gabriel Synnaeve
Hervé Jégou
ViT
38
47
0
27 Dec 2021
Masked Autoencoders Are Scalable Vision Learners
Kaiming He
Xinlei Chen
Saining Xie
Yanghao Li
Piotr Dollár
Ross B. Girshick
ViT
TPM
305
7,443
0
11 Nov 2021
Pix2seq: A Language Modeling Framework for Object Detection
Ting-Li Chen
Saurabh Saxena
Lala Li
David J. Fleet
Geoffrey E. Hinton
MLLM
ViT
VLM
241
344
0
22 Sep 2021
Benchmarking the Robustness of Instance Segmentation Models
Said Fahri Altindis
Yusuf Dalva
Hamza Pehlivan
Aysegül Dündar
VLM
OOD
29
12
0
02 Sep 2021
Multiview Detection with Shadow Transformer (and View-Coherent Data Augmentation)
Yunzhong Hou
Liang Zheng
ViT
56
51
0
12 Aug 2021
Emerging Properties in Self-Supervised Vision Transformers
Mathilde Caron
Hugo Touvron
Ishan Misra
Hervé Jégou
Julien Mairal
Piotr Bojanowski
Armand Joulin
317
5,785
0
29 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
287
1,524
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
277
3,623
0
24 Feb 2021
Zero-Shot Text-to-Image Generation
Aditya A. Ramesh
Mikhail Pavlov
Gabriel Goh
Scott Gray
Chelsea Voss
Alec Radford
Mark Chen
Ilya Sutskever
VLM
255
4,781
0
24 Feb 2021
Scaling Up Visual and Vision-Language Representation Learning With Noisy Text Supervision
Chao Jia
Yinfei Yang
Ye Xia
Yi-Ting Chen
Zarana Parekh
Hieu H. Pham
Quoc V. Le
Yun-hsuan Sung
Zhen Li
Tom Duerig
VLM
CLIP
301
3,700
0
11 Feb 2021
Bottleneck Transformers for Visual Recognition
A. Srinivas
Nayeon Lee
Niki Parmar
Jonathon Shlens
Pieter Abbeel
Ashish Vaswani
SLR
290
979
0
27 Jan 2021
Transformers in Vision: A Survey
Salman Khan
Muzammal Naseer
Munawar Hayat
Syed Waqas Zamir
F. Khan
M. Shah
ViT
227
2,430
0
04 Jan 2021
How Much Position Information Do Convolutional Neural Networks Encode?
Md. Amirul Islam
Sen Jia
Neil D. B. Bruce
SSL
205
344
0
22 Jan 2020
Unified Vision-Language Pre-Training for Image Captioning and VQA
Luowei Zhou
Hamid Palangi
Lei Zhang
Houdong Hu
Jason J. Corso
Jianfeng Gao
MLLM
VLM
252
927
0
24 Sep 2019
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Z. Tu
Kaiming He
297
10,220
0
16 Nov 2016
1