ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04560
  4. Cited By
Scaling Vision Transformers

Scaling Vision Transformers

8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "Scaling Vision Transformers"

50 / 751 papers shown
Title
Understanding out-of-distribution accuracies through quantifying
  difficulty of test samples
Understanding out-of-distribution accuracies through quantifying difficulty of test samples
Berfin Simsek
Melissa Hall
Levent Sagun
31
5
0
28 Mar 2022
Random matrix analysis of deep neural network weight matrices
Random matrix analysis of deep neural network weight matrices
M. Thamm
Max Staats
B. Rosenow
35
12
0
28 Mar 2022
Automated Progressive Learning for Efficient Training of Vision
  Transformers
Automated Progressive Learning for Efficient Training of Vision Transformers
Changlin Li
Bohan Zhuang
Guangrun Wang
Xiaodan Liang
Xiaojun Chang
Yi Yang
28
46
0
28 Mar 2022
GradViT: Gradient Inversion of Vision Transformers
GradViT: Gradient Inversion of Vision Transformers
Ali Hatamizadeh
Hongxu Yin
H. Roth
Wenqi Li
Jan Kautz
Daguang Xu
Pavlo Molchanov
ViT
25
63
0
22 Mar 2022
Towards Training Billion Parameter Graph Neural Networks for Atomic
  Simulations
Towards Training Billion Parameter Graph Neural Networks for Atomic Simulations
Anuroop Sriram
Abhishek Das
Brandon M. Wood
Siddharth Goyal
C. L. Zitnick
AI4CE
33
27
0
18 Mar 2022
Are Vision Transformers Robust to Spurious Correlations?
Are Vision Transformers Robust to Spurious Correlations?
Soumya Suvra Ghosal
Yifei Ming
Yixuan Li
ViT
27
28
0
17 Mar 2022
2-speed network ensemble for efficient classification of incremental
  land-use/land-cover satellite image chips
2-speed network ensemble for efficient classification of incremental land-use/land-cover satellite image chips
M. J. Horry
Subrata Chakraborty
B. Pradhan
N. Shukla
Sanjoy Paul
28
1
0
15 Mar 2022
Model soups: averaging weights of multiple fine-tuned models improves
  accuracy without increasing inference time
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
54
916
1
10 Mar 2022
Instance Segmentation for Autonomous Log Grasping in Forestry Operations
Instance Segmentation for Autonomous Log Grasping in Forestry Operations
Jean-Michel Fortin
Olivier Gamache
Vincent Grondin
F. Pomerleau
Philippe Giguère
27
22
0
03 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
  for Image Recognition without Convolutions
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
16
1
0
02 Mar 2022
Provable Stochastic Optimization for Global Contrastive Learning: Small
  Batch Does Not Harm Performance
Provable Stochastic Optimization for Global Contrastive Learning: Small Batch Does Not Harm Performance
Zhuoning Yuan
Yuexin Wu
Zi-qi Qiu
Xianzhi Du
Lijun Zhang
Denny Zhou
Tianbao Yang
34
26
0
24 Feb 2022
Learning to Merge Tokens in Vision Transformers
Learning to Merge Tokens in Vision Transformers
Cédric Renggli
André Susano Pinto
N. Houlsby
Basil Mustafa
J. Puigcerver
C. Riquelme
MoMe
19
56
0
24 Feb 2022
Auto-scaling Vision Transformers without Training
Auto-scaling Vision Transformers without Training
Wuyang Chen
Wei Huang
Xianzhi Du
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
32
23
0
24 Feb 2022
Retrieval Augmented Classification for Long-Tail Visual Recognition
Retrieval Augmented Classification for Long-Tail Visual Recognition
Alex Long
Wei Yin
Thalaiyasingam Ajanthan
Vu-Linh Nguyen
Pulak Purkait
Ravi Garg
Alan Blair
Chunhua Shen
Anton Van Den Hengel
21
107
0
22 Feb 2022
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for
  Image Recognition and Beyond
ViTAEv2: Vision Transformer Advanced by Exploring Inductive Bias for Image Recognition and Beyond
Qiming Zhang
Yufei Xu
Jing Zhang
Dacheng Tao
ViT
33
229
0
21 Feb 2022
Geometric Regularization from Overparameterization
Geometric Regularization from Overparameterization
Nicholas J. Teague
22
1
0
18 Feb 2022
On the Implicit Bias Towards Minimal Depth of Deep Neural Networks
On the Implicit Bias Towards Minimal Depth of Deep Neural Networks
Tomer Galanti
Liane Galanti
Ido Ben-Shaul
16
12
0
18 Feb 2022
Vision Models Are More Robust And Fair When Pretrained On Uncurated
  Images Without Supervision
Vision Models Are More Robust And Fair When Pretrained On Uncurated Images Without Supervision
Priya Goyal
Quentin Duval
Isaac Seessel
Mathilde Caron
Ishan Misra
Levent Sagun
Armand Joulin
Piotr Bojanowski
VLM
SSL
26
110
0
16 Feb 2022
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training
  Benchmark
Wukong: A 100 Million Large-scale Chinese Cross-modal Pre-training Benchmark
Jiaxi Gu
Xiaojun Meng
Guansong Lu
Lu Hou
Minzhe Niu
...
Runhu Huang
Wei Zhang
Xingda Jiang
Chunjing Xu
Hang Xu
VLM
43
88
0
14 Feb 2022
Scaling Laws Under the Microscope: Predicting Transformer Performance
  from Small Scale Experiments
Scaling Laws Under the Microscope: Predicting Transformer Performance from Small Scale Experiments
Maor Ivgi
Y. Carmon
Jonathan Berant
16
17
0
13 Feb 2022
Compute Trends Across Three Eras of Machine Learning
Compute Trends Across Three Eras of Machine Learning
J. Sevilla
Lennart Heim
A. Ho
T. Besiroglu
Marius Hobbhahn
Pablo Villalobos
30
269
0
11 Feb 2022
Towards an Analytical Definition of Sufficient Data
Towards an Analytical Definition of Sufficient Data
Adam Byerly
T. Kalganova
27
4
0
07 Feb 2022
Learning Features with Parameter-Free Layers
Learning Features with Parameter-Free Layers
Dongyoon Han
Y. Yoo
Beomyoung Kim
Byeongho Heo
35
8
0
06 Feb 2022
AtmoDist: Self-supervised Representation Learning for Atmospheric
  Dynamics
AtmoDist: Self-supervised Representation Learning for Atmospheric Dynamics
Sebastian Hoffmann
C. Lessig
AI4Cl
24
8
0
02 Feb 2022
Examining Scaling and Transfer of Language Model Architectures for
  Machine Translation
Examining Scaling and Transfer of Language Model Architectures for Machine Translation
Biao Zhang
Behrooz Ghorbani
Ankur Bapna
Yong Cheng
Xavier Garcia
Jonathan Shen
Orhan Firat
25
21
0
01 Feb 2022
A Comprehensive Study of Image Classification Model Sensitivity to
  Foregrounds, Backgrounds, and Visual Attributes
A Comprehensive Study of Image Classification Model Sensitivity to Foregrounds, Backgrounds, and Visual Attributes
Mazda Moayeri
Phillip E. Pope
Yogesh Balaji
S. Feizi
VLM
33
52
0
26 Jan 2022
Transformers in Medical Imaging: A Survey
Transformers in Medical Imaging: A Survey
Fahad Shamshad
Salman Khan
Syed Waqas Zamir
Muhammad Haris Khan
Munawar Hayat
F. Khan
Huazhu Fu
ViT
LM&MA
MedIm
111
663
0
24 Jan 2022
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Revisiting Weakly Supervised Pre-Training of Visual Perception Models
Mannat Singh
Laura Gustafson
Aaron B. Adcock
Vinicius de Freitas Reis
B. Gedik
Raj Prateek Kosaraju
D. Mahajan
Ross B. Girshick
Piotr Dollár
L. V. D. van der Maaten
VLM
40
123
0
20 Jan 2022
Video Transformers: A Survey
Video Transformers: A Survey
Javier Selva
A. S. Johansen
Sergio Escalera
Kamal Nasrollahi
T. Moeslund
Albert Clapés
ViT
22
103
0
16 Jan 2022
Pushing the limits of self-supervised ResNets: Can we outperform
  supervised learning without labels on ImageNet?
Pushing the limits of self-supervised ResNets: Can we outperform supervised learning without labels on ImageNet?
Nenad Tomašev
Ioana Bica
Brian McWilliams
Lars Buesing
Razvan Pascanu
Charles Blundell
Jovana Mitrović
SSL
90
81
0
13 Jan 2022
Multiview Transformers for Video Recognition
Multiview Transformers for Video Recognition
Shen Yan
Xuehan Xiong
Anurag Arnab
Zhichao Lu
Mi Zhang
Chen Sun
Cordelia Schmid
ViT
26
212
0
12 Jan 2022
MERLOT Reserve: Neural Script Knowledge through Vision and Language and
  Sound
MERLOT Reserve: Neural Script Knowledge through Vision and Language and Sound
Rowan Zellers
Jiasen Lu
Ximing Lu
Youngjae Yu
Yanpeng Zhao
Mohammadreza Salehi
Aditya Kusupati
Jack Hessel
Ali Farhadi
Yejin Choi
33
207
0
07 Jan 2022
Persformer: A Transformer Architecture for Topological Machine Learning
Persformer: A Transformer Architecture for Topological Machine Learning
Raphael Reinauer
Matteo Caorsi
Nicolas Berkouk
32
15
0
30 Dec 2021
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
SPViT: Enabling Faster Vision Transformers via Soft Token Pruning
Zhenglun Kong
Peiyan Dong
Xiaolong Ma
Xin Meng
Mengshu Sun
...
Geng Yuan
Bin Ren
Minghai Qin
H. Tang
Yanzhi Wang
ViT
34
144
0
27 Dec 2021
On Causal Inference for Data-free Structured Pruning
On Causal Inference for Data-free Structured Pruning
Martin Ferianc
A. Sankaran
Olivier Mastropietro
Ehsan Saboori
Quentin Cappart
CML
4
2
0
19 Dec 2021
A Simple Single-Scale Vision Transformer for Object Localization and
  Instance Segmentation
A Simple Single-Scale Vision Transformer for Object Localization and Instance Segmentation
Wuyang Chen
Xianzhi Du
Fan Yang
Lucas Beyer
Xiaohua Zhai
...
Huizhong Chen
Jing Li
Xiaodan Song
Zhangyang Wang
Denny Zhou
ViT
29
20
0
17 Dec 2021
Co-training Transformer with Videos and Images Improves Action
  Recognition
Co-training Transformer with Videos and Images Improves Action Recognition
Bowen Zhang
Jiahui Yu
Christopher Fifty
Wei Han
Andrew M. Dai
Ruoming Pang
Fei Sha
ViT
28
54
0
14 Dec 2021
Global Attention Mechanism: Retain Information to Enhance
  Channel-Spatial Interactions
Global Attention Mechanism: Retain Information to Enhance Channel-Spatial Interactions
Yichao Liu
Zongru Shao
Nico Hoffmann
11
448
0
10 Dec 2021
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
MLP Architectures for Vision-and-Language Modeling: An Empirical Study
Yi-Liang Nie
Linjie Li
Zhe Gan
Shuohang Wang
Chenguang Zhu
Michael Zeng
Zicheng Liu
Joey Tianyi Zhou
Lijuan Wang
24
6
0
08 Dec 2021
E$^2$(GO)MOTION: Motion Augmented Event Stream for Egocentric Action
  Recognition
E2^22(GO)MOTION: Motion Augmented Event Stream for Egocentric Action Recognition
Chiara Plizzari
M. Planamente
Gabriele Goletto
Marco Cannici
Emanuele Gusso
Matteo Matteucci
Barbara Caputo
EgoV
30
56
0
07 Dec 2021
AdaSplit: Adaptive Trade-offs for Resource-constrained Distributed Deep
  Learning
AdaSplit: Adaptive Trade-offs for Resource-constrained Distributed Deep Learning
Ayush Chopra
Surya Kant Sahu
Abhishek Singh
Abhinav Java
Praneeth Vepakomma
Vivek Sharma
Ramesh Raskar
32
26
0
02 Dec 2021
Pixelated Butterfly: Simple and Efficient Sparse training for Neural
  Network Models
Pixelated Butterfly: Simple and Efficient Sparse training for Neural Network Models
Tri Dao
Beidi Chen
Kaizhao Liang
Jiaming Yang
Zhao-quan Song
Atri Rudra
Christopher Ré
33
75
0
30 Nov 2021
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image
  Analysis
Self-Supervised Pre-Training of Swin Transformers for 3D Medical Image Analysis
Yucheng Tang
Dong Yang
Wenqi Li
H. Roth
Bennett Landman
Daguang Xu
V. Nath
Ali Hatamizadeh
ViT
MedIm
42
517
0
29 Nov 2021
Improving traffic sign recognition by active search
Improving traffic sign recognition by active search
Sami Jaghouar
Hannes Gustafsson
Bernhard Mehlig
Erik Werner
N. Gustafsson
25
0
0
29 Nov 2021
Improved Fine-Tuning by Better Leveraging Pre-Training Data
Improved Fine-Tuning by Better Leveraging Pre-Training Data
Ziquan Liu
Yi Tian Xu
Yuanhong Xu
Qi Qian
Hao Li
Xiangyang Ji
Antoni B. Chan
Rong Jin
14
37
0
24 Nov 2021
Scaling Up Vision-Language Pre-training for Image Captioning
Scaling Up Vision-Language Pre-training for Image Captioning
Xiaowei Hu
Zhe Gan
Jianfeng Wang
Zhengyuan Yang
Zicheng Liu
Yumao Lu
Lijuan Wang
MLLM
VLM
34
246
0
24 Nov 2021
Florence: A New Foundation Model for Computer Vision
Florence: A New Foundation Model for Computer Vision
Lu Yuan
Dongdong Chen
Yi-Ling Chen
Noel Codella
Xiyang Dai
...
Zhen Xiao
Jianwei Yang
Michael Zeng
Luowei Zhou
Pengchuan Zhang
VLM
29
879
0
22 Nov 2021
DBIA: Data-free Backdoor Injection Attack against Transformer Networks
DBIA: Data-free Backdoor Injection Attack against Transformer Networks
Peizhuo Lv
Hualong Ma
Jiachen Zhou
Ruigang Liang
Kai Chen
Shengzhi Zhang
Yunfei Yang
24
15
0
22 Nov 2021
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
ExT5: Towards Extreme Multi-Task Scaling for Transfer Learning
V. Aribandi
Yi Tay
Tal Schuster
J. Rao
H. Zheng
...
Jianmo Ni
Jai Gupta
Kai Hui
Sebastian Ruder
Donald Metzler
MoE
18
213
0
22 Nov 2021
TransMorph: Transformer for unsupervised medical image registration
TransMorph: Transformer for unsupervised medical image registration
Junyu Chen
Eric C. Frey
Yufan He
W. Paul Segars
Ye Li
Yong Du
ViT
MedIm
39
302
0
19 Nov 2021
Previous
123...13141516
Next