ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04560
  4. Cited By
Scaling Vision Transformers

Scaling Vision Transformers

8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "Scaling Vision Transformers"

50 / 751 papers shown
Title
Assaying Out-Of-Distribution Generalization in Transfer Learning
Assaying Out-Of-Distribution Generalization in Transfer Learning
F. Wenzel
Andrea Dittadi
Peter V. Gehler
Carl-Johann Simon-Gabriel
Max Horn
...
Chris Russell
Thomas Brox
Bernt Schiele
Bernhard Schölkopf
Francesco Locatello
OOD
OODD
AAML
60
71
0
19 Jul 2022
Transfer learning for time series classification using synthetic data
  generation
Transfer learning for time series classification using synthetic data generation
Yarden Rotem
Nathaniel Shimoni
Lior Rokach
Bracha Shapira
SyDa
21
8
0
16 Jul 2022
Plex: Towards Reliability using Pretrained Large Model Extensions
Plex: Towards Reliability using Pretrained Large Model Extensions
Dustin Tran
J. Liu
Michael W. Dusenberry
Du Phan
Mark Collier
...
D. Sculley
Y. Gal
Zoubin Ghahramani
Jasper Snoek
Balaji Lakshminarayanan
VLM
39
124
0
15 Jul 2022
ScaleNet: Searching for the Model to Scale
ScaleNet: Searching for the Model to Scale
Jiyang Xie
Xiu Su
Shan You
Zhanyu Ma
Fei Wang
Chao Qian
31
5
0
15 Jul 2022
Convolutional Bypasses Are Better Vision Transformer Adapters
Convolutional Bypasses Are Better Vision Transformer Adapters
Shibo Jie
Zhi-Hong Deng
VPVLM
21
131
0
14 Jul 2022
Synergy and Symmetry in Deep Learning: Interactions between the Data,
  Model, and Inference Algorithm
Synergy and Symmetry in Deep Learning: Interactions between the Data, Model, and Inference Algorithm
Lechao Xiao
Jeffrey Pennington
34
6
0
11 Jul 2022
How Much More Data Do I Need? Estimating Requirements for Downstream
  Tasks
How Much More Data Do I Need? Estimating Requirements for Downstream Tasks
Rafid Mahmood
James Lucas
David Acuna
Daiqing Li
Jonah Philion
Jose M. Alvarez
Zhiding Yu
Sanja Fidler
M. Law
19
27
0
04 Jul 2022
Revisiting Classifier: Transferring Vision-Language Models for Video
  Recognition
Revisiting Classifier: Transferring Vision-Language Models for Video Recognition
Wenhao Wu
Zhun Sun
Wanli Ouyang
VLM
103
93
0
04 Jul 2022
Beyond neural scaling laws: beating power law scaling via data pruning
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher
Robert Geirhos
Shashank Shekhar
Surya Ganguli
Ari S. Morcos
22
418
0
29 Jun 2022
ZoDIAC: Zoneout Dropout Injection Attention Calculation
ZoDIAC: Zoneout Dropout Injection Attention Calculation
Zanyar Zohourianshahzadi
Jugal Kalita
28
0
0
28 Jun 2022
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online
  Videos
Video PreTraining (VPT): Learning to Act by Watching Unlabeled Online Videos
Bowen Baker
Ilge Akkaya
Peter Zhokhov
Joost Huizinga
Jie Tang
Adrien Ecoffet
Brandon Houghton
Raul Sampedro
Jeff Clune
OffRL
39
285
0
23 Jun 2022
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Scaling Autoregressive Models for Content-Rich Text-to-Image Generation
Jiahui Yu
Yuanzhong Xu
Jing Yu Koh
Thang Luong
Gunjan Baid
...
Zarana Parekh
Xin Li
Han Zhang
Jason Baldridge
Yonghui Wu
EGVM
107
1,066
0
22 Jun 2022
Replacing Labeled Real-image Datasets with Auto-generated Contours
Replacing Labeled Real-image Datasets with Auto-generated Contours
Hirokatsu Kataoka
Ryo Hayamizu
Ryosuke Yamada
Kodai Nakashima
Sora Takashima
Xinyu Zhang
Edgar Josafat Martinez-Noriega
Nakamasa Inoue
Rio Yokota
22
23
0
18 Jun 2022
Edge Inference with Fully Differentiable Quantized Mixed Precision
  Neural Networks
Edge Inference with Fully Differentiable Quantized Mixed Precision Neural Networks
Clemens J. S. Schaefer
Siddharth Joshi
Shane Li
Raul Blazquez
MQ
30
9
0
15 Jun 2022
Forecasting of depth and ego-motion with transformers and
  self-supervision
Forecasting of depth and ego-motion with transformers and self-supervision
Houssem-eddine Boulahbal
A. Voicila
Andrew I. Comport
ViT
MDE
27
3
0
15 Jun 2022
Efficient Adaptive Ensembling for Image Classification
Efficient Adaptive Ensembling for Image Classification
A. Bruno
Davide Moroni
M. Martinelli
34
18
0
15 Jun 2022
Differentiable Top-k Classification Learning
Differentiable Top-k Classification Learning
Felix Petersen
Hilde Kuehne
Christian Borgelt
Oliver Deussen
59
28
0
15 Jun 2022
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling
  Strategies
PointNeXt: Revisiting PointNet++ with Improved Training and Scaling Strategies
Guocheng Qian
Yuchen Li
Houwen Peng
Jinjie Mai
Hasan Hammoud
Mohamed Elhoseiny
Guohao Li
3DPC
36
601
0
09 Jun 2022
On Data Scaling in Masked Image Modeling
On Data Scaling in Masked Image Modeling
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Yutong Lin
Yixuan Wei
Qi Dai
Han Hu
31
52
0
09 Jun 2022
Neural Collapse: A Review on Modelling Principles and Generalization
Neural Collapse: A Review on Modelling Principles and Generalization
Vignesh Kothapalli
25
71
0
08 Jun 2022
Boundary between noise and information applied to filtering neural
  network weight matrices
Boundary between noise and information applied to filtering neural network weight matrices
Max Staats
M. Thamm
B. Rosenow
23
3
0
08 Jun 2022
Can CNNs Be More Robust Than Transformers?
Can CNNs Be More Robust Than Transformers?
Zeyu Wang
Yutong Bai
Yuyin Zhou
Cihang Xie
UQCV
OOD
27
46
0
07 Jun 2022
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture
  of Experts
Multimodal Contrastive Learning with LIMoE: the Language-Image Mixture of Experts
Basil Mustafa
C. Riquelme
J. Puigcerver
Rodolphe Jenatton
N. Houlsby
VLM
MoE
28
183
0
06 Jun 2022
Separable Self-attention for Mobile Vision Transformers
Separable Self-attention for Mobile Vision Transformers
Sachin Mehta
Mohammad Rastegari
ViT
MQ
26
252
0
06 Jun 2022
Transforming medical imaging with Transformers? A comparative review of
  key properties, current progresses, and future perspectives
Transforming medical imaging with Transformers? A comparative review of key properties, current progresses, and future perspectives
Jun Li
Junyu Chen
Yucheng Tang
Ce Wang
Bennett A. Landman
S. K. Zhou
ViT
OOD
MedIm
23
21
0
02 Jun 2022
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via
  Feature Distillation
Contrastive Learning Rivals Masked Image Modeling in Fine-tuning via Feature Distillation
Yixuan Wei
Han Hu
Zhenda Xie
Zheng-Wei Zhang
Yue Cao
Jianmin Bao
Dong Chen
B. Guo
CLIP
88
124
0
27 May 2022
How Tempering Fixes Data Augmentation in Bayesian Neural Networks
How Tempering Fixes Data Augmentation in Bayesian Neural Networks
Gregor Bachmann
Lorenzo Noci
Thomas Hofmann
BDL
AAML
80
8
0
27 May 2022
AdaptFormer: Adapting Vision Transformers for Scalable Visual
  Recognition
AdaptFormer: Adapting Vision Transformers for Scalable Visual Recognition
Shoufa Chen
Chongjian Ge
Zhan Tong
Jiangliu Wang
Yibing Song
Jue Wang
Ping Luo
152
639
0
26 May 2022
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Crossmodal-3600: A Massively Multilingual Multimodal Evaluation Dataset
Ashish V. Thapliyal
Jordi Pont-Tuset
Xi Chen
Radu Soricut
VGen
88
72
0
25 May 2022
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
UViM: A Unified Modeling Approach for Vision with Learned Guiding Codes
Alexander Kolesnikov
André Susano Pinto
Lucas Beyer
Xiaohua Zhai
Jeremiah Harmsen
N. Houlsby
103
67
0
20 May 2022
A Unified and Biologically-Plausible Relational Graph Representation of
  Vision Transformers
A Unified and Biologically-Plausible Relational Graph Representation of Vision Transformers
Yuzhong Chen
Yu Du
Zhe Xiao
Lin Zhao
Lu Zhang
...
Dajiang Zhu
Tuo Zhang
Xintao Hu
Tianming Liu
Xi Jiang
ViT
27
5
0
20 May 2022
A CLIP-Hitchhiker's Guide to Long Video Retrieval
A CLIP-Hitchhiker's Guide to Long Video Retrieval
Max Bain
Arsha Nagrani
Gül Varol
Andrew Zisserman
CLIP
129
62
0
17 May 2022
Simple Open-Vocabulary Object Detection with Vision Transformers
Simple Open-Vocabulary Object Detection with Vision Transformers
Matthias Minderer
A. Gritsenko
Austin Stone
Maxim Neumann
Dirk Weissenborn
...
Zhuoran Shen
Tianlin Li
Xiaohua Zhai
Thomas Kipf
N. Houlsby
ObjD
CLIP
VLM
ViT
OCL
34
307
0
12 May 2022
Vision Transformer: Vit and its Derivatives
Vision Transformer: Vit and its Derivatives
Zujun Fu
ViT
41
6
0
12 May 2022
When does dough become a bagel? Analyzing the remaining mistakes on
  ImageNet
When does dough become a bagel? Analyzing the remaining mistakes on ImageNet
Vijay Vasudevan
Benjamin Caine
Raphael Gontijo-Lopes
Sara Fridovich-Keil
Rebecca Roelofs
VLM
UQCV
46
57
0
09 May 2022
Large Scale Transfer Learning for Differentially Private Image
  Classification
Large Scale Transfer Learning for Differentially Private Image Classification
Harsh Mehta
Abhradeep Thakurta
Alexey Kurakin
Ashok Cutkosky
17
39
0
06 May 2022
CoCa: Contrastive Captioners are Image-Text Foundation Models
CoCa: Contrastive Captioners are Image-Text Foundation Models
Jiahui Yu
Zirui Wang
Vijay Vasudevan
Legg Yeung
Mojtaba Seyedhosseini
Yonghui Wu
VLM
CLIP
OffRL
79
1,256
0
04 May 2022
Better plain ViT baselines for ImageNet-1k
Better plain ViT baselines for ImageNet-1k
Lucas Beyer
Xiaohua Zhai
Alexander Kolesnikov
ViT
VLM
33
111
0
03 May 2022
Jack and Masters of all Trades: One-Pass Learning Sets of Model Sets
  From Large Pre-Trained Models
Jack and Masters of all Trades: One-Pass Learning Sets of Model Sets From Large Pre-Trained Models
Han Xiang Choong
Yew-Soon Ong
Abhishek Gupta
Caishun Chen
Ray Lim
27
4
0
02 May 2022
HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory
  Prediction via Scene Encoding
HDGT: Heterogeneous Driving Graph Transformer for Multi-Agent Trajectory Prediction via Scene Encoding
Xiaosong Jia
Peng Wu
Li Chen
Y. Liu
Hongyang Li
Junchi Yan
32
120
0
30 Apr 2022
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
MiCS: Near-linear Scaling for Training Gigantic Model on Public Cloud
Zhen Zhang
Shuai Zheng
Yida Wang
Justin Chiu
George Karypis
Trishul Chilimbi
Mu Li
Xin Jin
19
39
0
30 Apr 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,349
0
29 Apr 2022
Unlocking High-Accuracy Differentially Private Image Classification
  through Scale
Unlocking High-Accuracy Differentially Private Image Classification through Scale
Soham De
Leonard Berrada
Jamie Hayes
Samuel L. Smith
Borja Balle
35
217
0
28 Apr 2022
RelViT: Concept-guided Vision Transformer for Visual Relational
  Reasoning
RelViT: Concept-guided Vision Transformer for Visual Relational Reasoning
Xiaojian Ma
Weili Nie
Zhiding Yu
Huaizu Jiang
Chaowei Xiao
Yuke Zhu
Song-Chun Zhu
Anima Anandkumar
ViT
LRM
30
19
0
24 Apr 2022
Residual Mixture of Experts
Residual Mixture of Experts
Lemeng Wu
Mengchen Liu
Yinpeng Chen
Dongdong Chen
Xiyang Dai
Lu Yuan
MoE
22
36
0
20 Apr 2022
ResT V2: Simpler, Faster and Stronger
ResT V2: Simpler, Faster and Stronger
Qing-Long Zhang
Yubin Yang
ViT
35
25
0
15 Apr 2022
Generative Adversarial Networks for Image Augmentation in Agriculture: A
  Systematic Review
Generative Adversarial Networks for Image Augmentation in Agriculture: A Systematic Review
E. Olaniyi
Dong Chen
Yuzhen Lu
Ya-Yu Huang
21
38
0
10 Apr 2022
DaViT: Dual Attention Vision Transformers
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
51
242
0
07 Apr 2022
MaxViT: Multi-Axis Vision Transformer
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
62
636
0
04 Apr 2022
Exploring Plain Vision Transformer Backbones for Object Detection
Exploring Plain Vision Transformer Backbones for Object Detection
Yanghao Li
Hanzi Mao
Ross B. Girshick
Kaiming He
ViT
36
775
0
30 Mar 2022
Previous
123...1213141516
Next