Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05442
Cited By
Scaling Vision Transformers to 22 Billion Parameters
10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Tianlin Li
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
Feng Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Vision Transformers to 22 Billion Parameters"
38 / 138 papers shown
Title
Spatial-frequency channels, shape bias, and adversarial robustness
Ajay Subramanian
E. Sizikova
N. Majaj
D. Pelli
AAML
96
22
0
22 Sep 2023
Replacing softmax with ReLU in Vision Transformers
Mitchell Wortsman
Jaehoon Lee
Justin Gilmer
Simon Kornblith
ViT
91
33
0
15 Sep 2023
Can you text what is happening? Integrating pre-trained language encoders into trajectory prediction models for autonomous driving
Ali Keysan
Andreas Look
Eitan Kosman
Gonca Gürsun
Jörg Wagner
Yu Yao
Barbara Rakitsch
94
31
0
11 Sep 2023
Composable Function-preserving Expansions for Transformer Architectures
Andrea Gesmundo
Kaitlin Maile
AI4CE
112
8
0
11 Aug 2023
Sample Less, Learn More: Efficient Action Recognition via Frame Feature Restoration
Harry Cheng
Yangyang Guo
Liqiang Nie
Zhiyong Cheng
Mohan S. Kankanhalli
92
7
0
27 Jul 2023
URL: A Representation Learning Benchmark for Transferable Uncertainty Estimates
Michael Kirchhof
Bálint Mucsányi
Seong Joon Oh
Enkelejda Kasneci
UQCV
502
15
0
07 Jul 2023
All in One: Exploring Unified Vision-Language Tracking with Multi-Modal Alignment
Chunhui Zhang
Xin Sun
Li Liu
Yiqian Yang
Qiong Liu
Xiaoping Zhou
Yanfeng Wang
220
17
0
07 Jul 2023
The Shaped Transformer: Attention Models in the Infinite Depth-and-Width Limit
Lorenzo Noci
Chuning Li
Mufan Li
Bobby He
Thomas Hofmann
Chris J. Maddison
Daniel M. Roy
125
36
0
30 Jun 2023
Pushing the Limits of 3D Shape Generation at Scale
Wang Yu
Xuelin Qian
Jingyang Huo
Tiejun Huang
Bo Zhao
Yanwei Fu
127
11
0
20 Jun 2023
DreamSim: Learning New Dimensions of Human Visual Similarity using Synthetic Data
Stephanie Fu
Netanel Y. Tamir
Shobhita Sundaram
Lucy Chai
Richard Y. Zhang
Tali Dekel
Phillip Isola
EGVM
100
123
0
15 Jun 2023
Deep Learning for Day Forecasts from Sparse Observations
Marcin Andrychowicz
L. Espeholt
Di Li
Samier Merchant
Alexander Merose
Fred Zyda
Shreya Agrawal
Nal Kalchbrenner
AI4Cl
127
68
0
06 Jun 2023
Adversarial alignment: Breaking the trade-off between the strength of an attack and its relevance to human perception
Drew Linsley
Pinyuan Feng
Thibaut Boissin
A. Ashok
Thomas Fel
Stephanie Olaiya
Thomas Serre
AAML
78
6
0
05 Jun 2023
PaLI-X: On Scaling up a Multilingual Vision and Language Model
Xi Chen
Josip Djolonga
Piotr Padlewski
Basil Mustafa
Soravit Changpinyo
...
Mojtaba Seyedhosseini
A. Angelova
Xiaohua Zhai
N. Houlsby
Radu Soricut
VLM
162
203
0
29 May 2023
Are Deep Neural Networks Adequate Behavioural Models of Human Visual Perception?
Felix Wichmann
Robert Geirhos
63
28
0
26 May 2023
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
198
226
0
25 May 2023
Pre-RMSNorm and Pre-CRMSNorm Transformers: Equivalent and Efficient Pre-LN Transformers
Zixuan Jiang
Jiaqi Gu
Hanqing Zhu
David Z. Pan
AI4CE
104
18
0
24 May 2023
TVTSv2: Learning Out-of-the-box Spatiotemporal Visual Representations at Scale
Ziyun Zeng
Yixiao Ge
Zhan Tong
Xihui Liu
Shutao Xia
Ying Shan
82
9
0
23 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
102
101
0
14 May 2023
Visual Tuning
Bruce X. B. Yu
Jianlong Chang
Haixin Wang
Lin Liu
Shijie Wang
...
Lingxi Xie
Haojie Li
Zhouchen Lin
Qi Tian
Chang Wen Chen
VLM
177
41
0
10 May 2023
AttentionViz: A Global View of Transformer Attention
Catherine Yeh
Yida Chen
Aoyu Wu
Cynthia Chen
Fernanda Viégas
Martin Wattenberg
ViT
79
56
0
04 May 2023
Distilling from Similar Tasks for Transfer Learning on a Budget
Kenneth Borup
Cheng Perng Phoo
Bharath Hariharan
93
3
0
24 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
105
43
0
07 Apr 2023
Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster
Nolan Dey
Gurpreet Gosal
Zhiming Chen
Chen
Hemant Khachane
William Marshall
Ribhu Pathria
Marvin Tom
Joel Hestness
MoE
LRM
130
108
0
06 Apr 2023
The Vector Grounding Problem
Dimitri Coelho Mollo
Raphael Milliere
148
28
0
04 Apr 2023
MM-REACT: Prompting ChatGPT for Multimodal Reasoning and Action
Zhengyuan Yang
Linjie Li
Jianfeng Wang
Kevin Qinghong Lin
E. Azarnasab
Faisal Ahmed
Zicheng Liu
Ce Liu
Michael Zeng
Lijuan Wang
ReLM
KELM
LRM
128
397
0
20 Mar 2023
OmniForce: On Human-Centered, Large Model Empowered and Cloud-Edge Collaborative AutoML System
Chao Xue
Wen Liu
Shunxing Xie
Zhenfang Wang
Jiaxing Li
...
Shi-Yong Chen
Yibing Zhan
Jing Zhang
Chaoyue Wang
Dacheng Tao
106
2
0
01 Mar 2023
On the Robustness of ChatGPT: An Adversarial and Out-of-distribution Perspective
Jindong Wang
Xixu Hu
Wenxin Hou
Hao Chen
Runkai Zheng
...
Weirong Ye
Xiubo Geng
Binxing Jiao
Yue Zhang
Xingxu Xie
AI4MH
180
241
0
22 Feb 2023
Adaptive Computation with Elastic Input Sequence
Fuzhao Xue
Valerii Likhosherstov
Anurag Arnab
N. Houlsby
Mostafa Dehghani
Yang You
92
21
0
30 Jan 2023
Latent Diffusion for Language Generation
Justin Lovelace
Varsha Kishore
Chao-gang Wan
Eliot Shekhtman
Kilian Q. Weinberger
DiffM
134
82
0
19 Dec 2022
Deep Incubation: Training Large Models by Divide-and-Conquering
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
98
11
0
08 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
70
14
0
05 Dec 2022
XKD: Cross-modal Knowledge Distillation with Domain Alignment for Video Representation Learning
Pritam Sarkar
Ali Etemad
112
23
0
25 Nov 2022
Accelerating Transfer Learning with Near-Data Computation on Cloud Object Stores
Arsany Guirguis
Diana Petrescu
Florin Dinu
D. Quoc
Javier Picorel
R. Guerraoui
75
0
0
16 Oct 2022
Towards a Unified View on Visual Parameter-Efficient Transfer Learning
Bruce X. B. Yu
Jianlong Chang
Lin Liu
Qi Tian
Changan Chen
VPVLM
VLM
115
36
0
03 Oct 2022
Self-Supervised and Interpretable Anomaly Detection using Network Transformers
Daniel L. Marino
Chathurika S. Wickramasinghe
C. Rieger
Milos Manic
71
8
0
25 Feb 2022
NeuroBack: Improving CDCL SAT Solving using Graph Neural Networks
Wenxi Wang
Yang Hu
Mohit Tiwari
S. Khurshid
K. McMillan
Risto Miikkulainen
GNN
NAI
80
8
0
26 Oct 2021
Contextualizing Enhances Gradient Based Meta Learning
Evan Vogelbaum
Rumen Dangovski
L. Jing
Marin Soljacic
126
3
0
17 Jul 2020
On the Relationship between Self-Attention and Convolutional Layers
Jean-Baptiste Cordonnier
Andreas Loukas
Martin Jaggi
184
535
0
08 Nov 2019
Previous
1
2
3