Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2101.11986
Cited By
Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet
28 January 2021
Li-xin Yuan
Yunpeng Chen
Tao Wang
Weihao Yu
Yujun Shi
Zihang Jiang
Francis E. H. Tay
Jiashi Feng
Shuicheng Yan
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Tokens-to-Token ViT: Training Vision Transformers from Scratch on ImageNet"
50 / 396 papers shown
Title
Visual Attention Network
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViT
VLM
24
637
0
20 Feb 2022
ActionFormer: Localizing Moments of Actions with Transformers
Chen-Da Liu-Zhang
Jianxin Wu
Yin Li
ViT
31
329
0
16 Feb 2022
Mixing and Shifting: Exploiting Global and Local Dependencies in Vision MLPs
Huangjie Zheng
Pengcheng He
Weizhu Chen
Mingyuan Zhou
22
14
0
14 Feb 2022
LwPosr: Lightweight Efficient Fine-Grained Head Pose Estimation
Naina Dhingra
29
16
0
07 Feb 2022
Aggregating Global Features into Local Vision Transformer
Krushi Patel
A. Bur
Fengju Li
Guanghui Wang
ViT
33
34
0
30 Jan 2022
O-ViT: Orthogonal Vision Transformer
Yanhong Fei
Yingjie Liu
Xian Wei
Mingsong Chen
ViT
13
8
0
28 Jan 2022
DynaMixer: A Vision MLP Architecture with Dynamic Mixing
Ziyu Wang
Wenhao Jiang
Yiming Zhu
Li Yuan
Yibing Song
Wei Liu
43
44
0
28 Jan 2022
UniFormer: Unifying Convolution and Self-attention for Visual Recognition
Kunchang Li
Yali Wang
Junhao Zhang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
162
360
0
24 Jan 2022
VAQF: Fully Automatic Software-Hardware Co-Design Framework for Low-Bit Vision Transformer
Mengshu Sun
Haoyu Ma
Guoliang Kang
Yi Ding
Tianlong Chen
Xiaolong Ma
Zhangyang Wang
Yanzhi Wang
ViT
33
45
0
17 Jan 2022
UniFormer: Unified Transformer for Efficient Spatiotemporal Representation Learning
Kunchang Li
Yali Wang
Peng Gao
Guanglu Song
Yu Liu
Hongsheng Li
Yu Qiao
ViT
47
238
0
12 Jan 2022
QuadTree Attention for Vision Transformers
Shitao Tang
Jiahui Zhang
Siyu Zhu
Ping Tan
ViT
169
156
0
08 Jan 2022
PyramidTNT: Improved Transformer-in-Transformer Baselines with Pyramid Architecture
Kai Han
Jianyuan Guo
Yehui Tang
Yunhe Wang
ViT
34
22
0
04 Jan 2022
Multi-Dimensional Model Compression of Vision Transformer
Zejiang Hou
S. Kung
ViT
25
16
0
31 Dec 2021
Pale Transformer: A General Vision Transformer Backbone with Pale-Shaped Attention
Sitong Wu
Tianyi Wu
Hao Hao Tan
G. Guo
ViT
31
70
0
28 Dec 2021
Vision Transformer for Small-Size Datasets
Seung Hoon Lee
Seunghyun Lee
B. Song
ViT
22
222
0
27 Dec 2021
ELSA: Enhanced Local Self-Attention for Vision Transformer
Jingkai Zhou
Pichao Wang
Fan Wang
Qiong Liu
Hao Li
Rong Jin
ViT
37
37
0
23 Dec 2021
Lite Vision Transformer with Enhanced Self-Attention
Chenglin Yang
Yilin Wang
Jianming Zhang
He Zhang
Zijun Wei
Zhe-nan Lin
Alan Yuille
ViT
21
112
0
20 Dec 2021
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
29
20
0
17 Dec 2021
Full Transformer Framework for Robust Point Cloud Registration with Deep Information Interaction
Guang-Sheng Chen
Meiling Wang
Yufeng Yue
Qingxiang Zhang
Li-xin Yuan
ViT
37
17
0
17 Dec 2021
AdaViT: Adaptive Tokens for Efficient Vision Transformer
Hongxu Yin
Arash Vahdat
J. Álvarez
Arun Mallya
Jan Kautz
Pavlo Molchanov
ViT
35
314
0
14 Dec 2021
EMDS-6: Environmental Microorganism Image Dataset Sixth Version for Image Denoising, Segmentation, Feature Extraction, Classification and Detection Methods Evaluation
Penghui Zhao
Chen Li
M. Rahaman
Hao Xu
Pingli Ma
Hechen Yang
Hongzan Sun
Tao Jiang
N. Xu
M. Grzegorzek
29
19
0
14 Dec 2021
Visual Transformers with Primal Object Queries for Multi-Label Image Classification
V. O. Yazici
Joost van de Weijer
Longlong Yu
ViT
21
1
0
10 Dec 2021
Fast Point Transformer
Chunghyun Park
Yoonwoo Jeong
Minsu Cho
Jaesik Park
3DPC
ViT
38
168
0
09 Dec 2021
Learning Tracking Representations via Dual-Branch Fully Transformer Networks
Fei Xie
Chunyu Wang
Guangting Wang
Wankou Yang
Wenjun Zeng
ViT
22
48
0
05 Dec 2021
MViTv2: Improved Multiscale Vision Transformers for Classification and Detection
Yanghao Li
Chaoxia Wu
Haoqi Fan
K. Mangalam
Bo Xiong
Jitendra Malik
Christoph Feichtenhofer
ViT
69
677
0
02 Dec 2021
Uni-Perceiver: Pre-training Unified Architecture for Generic Perception for Zero-shot and Few-shot Tasks
Xizhou Zhu
Jinguo Zhu
Hao Li
Xiaoshi Wu
Xiaogang Wang
Hongsheng Li
Xiaohua Wang
Jifeng Dai
56
129
0
02 Dec 2021
SwinTrack: A Simple and Strong Baseline for Transformer Tracking
Liting Lin
Heng Fan
Zhipeng Zhang
Yong-mei Xu
Haibin Ling
ViT
31
303
0
02 Dec 2021
Vision Pair Learning: An Efficient Training Framework for Image Classification
Bei Tong
Xiaoyuan Yu
ViT
20
0
0
02 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao Song
Atri Rudra
Christopher Ré
33
75
0
30 Nov 2021
CT-block: a novel local and global features extractor for point cloud
Shangwei Guo
Jun Li
Zhengchao Lai
Xiantong Meng
Shaokun Han
ViT
3DPC
27
2
0
30 Nov 2021
Shunted Self-Attention via Multi-Scale Token Aggregation
Sucheng Ren
Daquan Zhou
Shengfeng He
Jiashi Feng
Xinchao Wang
ViT
35
222
0
30 Nov 2021
On the Integration of Self-Attention and Convolution
Xuran Pan
Chunjiang Ge
Rui Lu
S. Song
Guanfu Chen
Zeyi Huang
Gao Huang
SSL
41
287
0
29 Nov 2021
SWAT: Spatial Structure Within and Among Tokens
Kumara Kahatapitiya
Michael S. Ryoo
25
6
0
26 Nov 2021
A Robust Volumetric Transformer for Accurate 3D Tumor Segmentation
Himashi Peiris
Munawar Hayat
Zhaolin Chen
Gary Egan
Mehrtash Harandi
ViT
MedIm
22
123
0
26 Nov 2021
Self-slimmed Vision Transformer
Zhuofan Zong
Kunchang Li
Guanglu Song
Yali Wang
Yu Qiao
B. Leng
Yu Liu
ViT
21
30
0
24 Nov 2021
An Image Patch is a Wave: Phase-Aware Vision MLP
Yehui Tang
Kai Han
Jianyuan Guo
Chang Xu
Yanxi Li
Chao Xu
Yunhe Wang
24
133
0
24 Nov 2021
PhysFormer: Facial Video-based Physiological Measurement with Temporal Difference Transformer
Zitong Yu
Yuming Shen
Jingang Shi
Hengshuang Zhao
Philip Torr
Guoying Zhao
ViT
MedIm
140
167
0
23 Nov 2021
PointMixer: MLP-Mixer for Point Cloud Understanding
Jaesung Choe
Chunghyun Park
François Rameau
Jaesik Park
In So Kweon
3DPC
45
98
0
22 Nov 2021
Mesa: A Memory-saving Training Framework for Transformers
Zizheng Pan
Peng Chen
Haoyu He
Jing Liu
Jianfei Cai
Bohan Zhuang
31
20
0
22 Nov 2021
Swin Transformer V2: Scaling Up Capacity and Resolution
Ze Liu
Han Hu
Yutong Lin
Zhuliang Yao
Zhenda Xie
...
Yue Cao
Zheng-Wei Zhang
Li Dong
Furu Wei
B. Guo
ViT
67
1,747
0
18 Nov 2021
Restormer: Efficient Transformer for High-Resolution Image Restoration
Syed Waqas Zamir
Aditya Arora
Salman Khan
Munawar Hayat
Fahad Shahbaz Khan
Ming-Hsuan Yang
ViT
64
2,127
0
18 Nov 2021
TransMix: Attend to Mix for Vision Transformers
Jieneng Chen
Shuyang Sun
Ju He
Philip Torr
Alan Yuille
S. Bai
ViT
28
103
0
18 Nov 2021
Dynamically pruning segformer for efficient semantic segmentation
Haoli Bai
Hongda Mao
D. Nair
31
20
0
18 Nov 2021
Searching for TrioNet: Combining Convolution with Local and Global Self-Attention
Huaijin Pi
Huiyu Wang
Yingwei Li
Zizhang Li
Alan Yuille
ViT
27
3
0
15 Nov 2021
A Survey of Visual Transformers
Yang Liu
Yao Zhang
Yixin Wang
Feng Hou
Jin Yuan
Jiang Tian
Yang Zhang
Zhongchao Shi
Jianping Fan
Zhiqiang He
3DGS
ViT
77
330
0
11 Nov 2021
Are Transformers More Robust Than CNNs?
Yutong Bai
Jieru Mei
Alan Yuille
Cihang Xie
ViT
AAML
192
257
0
10 Nov 2021
Are we ready for a new paradigm shift? A Survey on Visual Deep MLP
Ruiyang Liu
Hai-Tao Zheng
Li Tao
Dun Liang
Haitao Zheng
85
97
0
07 Nov 2021
Relational Self-Attention: What's Missing in Attention for Video Understanding
Manjin Kim
Heeseung Kwon
Chunyu Wang
Suha Kwak
Minsu Cho
ViT
27
28
0
02 Nov 2021
Multi-Scale High-Resolution Vision Transformer for Semantic Segmentation
Jiaqi Gu
Hyoukjun Kwon
Dilin Wang
Wei Ye
Meng Li
Yu-Hsin Chen
Liangzhen Lai
Vikas Chandra
David Z. Pan
ViT
27
182
0
01 Nov 2021
Blending Anti-Aliasing into Vision Transformer
Shengju Qian
Hao Shao
Yi Zhu
Mu Li
Jiaya Jia
26
20
0
28 Oct 2021
Previous
1
2
3
4
5
6
7
8
Next