Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2302.05442
Cited By
Scaling Vision Transformers to 22 Billion Parameters
10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Tianlin Li
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
Feng Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
MLLM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Scaling Vision Transformers to 22 Billion Parameters"
50 / 138 papers shown
Title
FLUX.1 Kontext: Flow Matching for In-Context Image Generation and Editing in Latent Space
Black Forest Labs
Stephen Batifol
A. Blattmann
Frederic Boesel
Saksham Consul
...
Dustin Podell
Robin Rombach
Harry Saini
Axel Sauer
Luke Smith
DiffM
31
0
0
17 Jun 2025
Load Balancing Mixture of Experts with Similarity Preserving Routers
Nabil Omi
S. Sen
Ali Farhadi
MoE
47
0
0
16 Jun 2025
Can Hessian-Based Insights Support Fault Diagnosis in Attention-based Models?
Sigma Jahan
Mohammad Masudur Rahman
22
0
0
09 Jun 2025
dots.llm1 Technical Report
Bi Huo
Bin Tu
Cheng Qin
Da Zheng
Debing Zhang
...
Yuqiu Ji
Ze Wen
Zhenhai Liu
Zichao Li
Zilong Liao
MoE
63
0
0
06 Jun 2025
ContentV: Efficient Training of Video Generation Models with Limited Compute
Wenfeng Lin
Renjie Chen
Boyuan Liu
Shiyue Yan
Ruoyu Feng
...
Chao Feng
Jiao Ran
Qi Wu
Zuotao Liu
Mingyu Guo
VGen
129
0
0
05 Jun 2025
A Trustworthiness-based Metaphysics of Artificial Intelligence Systems
Andrea Ferrario
44
0
0
03 Jun 2025
Many-for-Many: Unify the Training of Multiple Video and Image Generation and Manipulation Tasks
Tao Yang
Ruibin Li
Yangming Shi
Yuqi Zhang
Qide Dong
Haoran Cheng
Weiguo Feng
Shilei Wen
Bingyue Peng
Lei Zhang
DiffM
VGen
73
0
0
02 Jun 2025
PDE-Transformer: Efficient and Versatile Transformers for Physics Simulations
Benjamin Holzschuh
Qiang Liu
Georg Kohl
Nils Thuerey
AI4CE
54
1
0
30 May 2025
Matryoshka Model Learning for Improved Elastic Student Models
Chetan Verma
Aditya Srinivas Timmaraju
Cho-Jui Hsieh
Suyash Damle
Ngot Bui
Y. Zhang
Wen Chen
Xin Liu
Prateek Jain
Inderjit S Dhillon
120
0
0
29 May 2025
ContextQFormer: A New Context Modeling Method for Multi-Turn Multi-Modal Conversations
Yiming Lei
Zhizheng Yang
Zeming Liu
Haitao Leng
Shaoguo Liu
Tingting Gao
Qingjie Liu
Yunhong Wang
38
0
0
29 May 2025
Progressive Scaling Visual Object Tracking
Jack Hong
Shilin Yan
Zehao Xiao
Jiayin Cai
Xiaolong Jiang
Yao Hu
Henghui Ding
83
0
0
26 May 2025
Asymmetric Duos: Sidekicks Improve Uncertainty
Tim G. Zhou
Evan Shelhamer
Geoff Pleiss
UQCV
58
0
0
24 May 2025
LookWhere? Efficient Visual Recognition by Learning Where to Look and What to See from Self-Supervision
A. Fuller
Yousef Yassin
Junfeng Wen
Daniel G. Kyrollos
Tarek Ibrahim
James R. Green
Evan Shelhamer
ViT
189
0
0
23 May 2025
Stronger ViTs With Octic Equivariance
David Nordström
Johan Edstedt
Fredrik Kahl
Georg Bökman
ViT
227
0
0
21 May 2025
A Unified and Scalable Membership Inference Method for Visual Self-supervised Encoder via Part-aware Capability
Jie Zhu
Jirong Zha
Ding Li
Leye Wang
147
1
0
15 May 2025
Hierarchical Compact Clustering Attention (COCA) for Unsupervised Object-Centric Learning
Can Küçüksözen
Yücel Yemez
OCL
175
0
0
04 May 2025
Accelerating Deep Neural Network Training via Distributed Hybrid Order Optimization
Shunxian Gu
Chaoqun You
Bangbang Ren
Lailong Luo
Junxu Xia
Deke Guo
76
0
0
02 May 2025
JointDiT: Enhancing RGB-Depth Joint Modeling with Diffusion Transformers
Kwon Byung-Ki
Qi Dai
Lee Hyoseok
Chong Luo
Tae-Hyun Oh
183
0
0
01 May 2025
Direct Motion Models for Assessing Generated Videos
Kelsey R. Allen
Carl Doersch
Guangyao Zhou
Mohammed Suhail
Danny Driess
...
Thomas Kipf
Mehdi S. M. Sajjadi
Kevin P. Murphy
João Carreira
Sjoerd van Steenkiste
EGVM
DiffM
VGen
177
0
0
30 Apr 2025
Perception Encoder: The best visual embeddings are not at the output of the network
Daniel Bolya
Po-Yao (Bernie) Huang
Peize Sun
Jang Hyun Cho
Andrea Madotto
...
Shiyu Dong
Nikhila Ravi
Daniel Li
Piotr Dollár
Christoph Feichtenhofer
ObjD
VOS
329
9
0
17 Apr 2025
UniViTAR: Unified Vision Transformer with Native Resolution
Limeng Qiao
Yiyang Gan
Bairui Wang
Jie Qin
Shuang Xu
Siqi Yang
Lin Ma
185
0
0
02 Apr 2025
ImF: Implicit Fingerprint for Large Language Models
Wu jiaxuan
Peng Wanli
Fu hang
Xue Yiming
Wen juan
163
0
0
25 Mar 2025
Your ViT is Secretly an Image Segmentation Model
Tommie Kerssies
Niccolò Cavagnero
Alexander Hermans
Narges Norouzi
Giuseppe Averta
Bastian Leibe
Gijs Dubbelman
Daan de Geus
ViT
VLM
123
5
0
24 Mar 2025
Exploring Training and Inference Scaling Laws in Generative Retrieval
Hongru Cai
Yongqi Li
Ruifeng Yuan
Wenjie Wang
Zhen Zhang
Wenjie Li
Tat-Seng Chua
85
1
0
24 Mar 2025
1000 Layer Networks for Self-Supervised RL: Scaling Depth Can Enable New Goal-Reaching Capabilities
Kevin Wang
Ishaan Javali
Michał Bortkiewicz
Tomasz Trzciñski
Benjamin Eysenbach
SSL
OffRL
124
2
0
19 Mar 2025
Historic Scripts to Modern Vision: A Novel Dataset and A VLM Framework for Transliteration of Modi Script to Devanagari
Harshal Kausadikar
Tanvi Kale
Onkar Susladkar
Sparsh Mittal
94
0
0
17 Mar 2025
APLA: A Simple Adaptation Method for Vision Transformers
Moein Sorkhei
Emir Konuk
Kevin Smith
Christos Matsoukas
146
0
0
14 Mar 2025
DiT-Air: Revisiting the Efficiency of Diffusion Model Architecture Design in Text to Image Generation
Chen Chen
Rui Qian
Wenze Hu
Tsu-Jui Fu
Jialing Tong
...
Lezhi Li
Bowen Zhang
Alex Schwing
Wei Liu
Yue Yang
147
0
0
13 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
133
0
0
06 Mar 2025
Neural ODE Transformers: Analyzing Internal Dynamics and Adaptive Fine-tuning
Anh Tong
Thanh Nguyen-Tang
Dongeun Lee
Duc Nguyen
Toan M. Tran
David Hall
Cheongwoong Kang
Jaesik Choi
152
1
0
03 Mar 2025
BAnG: Bidirectional Anchored Generation for Conditional RNA Design
Roman Klypa
Alberto Bietti
Sergei Grudinin
77
0
0
28 Feb 2025
Bayesian Computation in Deep Learning
Wenlong Chen
Bolian Li
Ruqi Zhang
Yingzhen Li
BDL
116
0
0
25 Feb 2025
Function-Space Learning Rates
Edward Milsom
Ben Anson
Laurence Aitchison
157
1
0
24 Feb 2025
Vision-LSTM: xLSTM as Generic Vision Backbone
Benedikt Alkin
M. Beck
Korbinian Poppel
Sepp Hochreiter
Johannes Brandstetter
VLM
235
49
0
24 Feb 2025
Optimizing Estimators of Squared Calibration Errors in Classification
Sebastian G. Gruber
Francis Bach
238
2
0
24 Feb 2025
Parameter Efficient Merging for Multimodal Large Language Models with Complementary Parameter Adaptation
Fanhu Zeng
Haiyang Guo
Fei Zhu
Li Shen
Hao Tang
MoMe
235
4
0
24 Feb 2025
Hyperspherical Normalization for Scalable Deep Reinforcement Learning
Hojoon Lee
Youngdo Lee
Takuma Seno
Donghu Kim
Peter Stone
Jaegul Choo
183
4
0
21 Feb 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
208
0
0
20 Feb 2025
One Model for All: Large Language Models are Domain-Agnostic Recommendation Systems
Zuoli Tang
Zhaoxin Huan
Zihao Li
Xiaolu Zhang
Jun Hu
Chilin Fu
Jun Zhou
Lixin Zou
Chenliang Li
151
20
0
20 Feb 2025
Object-Centric Latent Action Learning
Albina Klepach
Alexander Nikulin
Ilya Zisman
Denis Tarasov
Alexander Derevyagin
Andrei Polubarov
Nikita Lyubaykin
Vladislav Kurenkov
134
0
0
13 Feb 2025
UniMoD: Efficient Unified Multimodal Transformers with Mixture-of-Depths
Weijia Mao
Zhiyong Yang
Mike Zheng Shou
MoE
208
1
0
10 Feb 2025
Scaling Laws in Patchification: An Image Is Worth 50,176 Tokens And More
Feng Wang
Yaodong Yu
Guoyizhe Wei
Wei Shao
Yuyin Zhou
Alan Yuille
Cihang Xie
ViT
149
7
0
06 Feb 2025
FuXi-
α
\alpha
α
: Scaling Recommendation Model with Feature Interaction Enhanced Transformer
Yufei Ye
Wei Guo
Jin Yao Chin
Hao Wang
Hong Zhu
...
Yuyang Ye
Yixiao Liu
Ruiming Tang
Defu Lian
Enhong Chen
152
2
0
05 Feb 2025
Point-PRC: A Prompt Learning Based Regulation Framework for Generalizable Point Cloud Analysis
Hongyu Sun
Qiuhong Ke
Yanjie Wang
Wang Chen
Kang Yang
Deying Li
Jianfei Cai
3DPC
211
3
0
17 Jan 2025
Protego: Detecting Adversarial Examples for Vision Transformers via Intrinsic Capabilities
Jialin Wu
Kaikai Pan
Yanjiao Chen
Jiangyi Deng
Shengyuan Pang
Wei Dong
ViT
AAML
125
0
0
13 Jan 2025
A Comprehensive Survey of Large Language Models and Multimodal Large Language Models in Medicine
Hanguang Xiao
Feizhong Zhou
Xianglong Liu
Tianqi Liu
Zhipeng Li
Xin Liu
Xiaoxuan Huang
AILaw
LM&MA
LRM
166
30
0
31 Dec 2024
Scaling 4D Representations
João Carreira
Dilara Gokay
Michael King
Chuhan Zhang
Ignacio Rocco
...
Viorica Patraucean
Dima Damen
Pauline Luc
Mehdi S. M. Sajjadi
Andrew Zisserman
143
5
0
19 Dec 2024
Hardware Scaling Trends and Diminishing Returns in Large-Scale Distributed Training
Jared Fernandez
Luca Wehrstedt
Leonid Shamis
Mostafa Elhoushi
Kalyan Saladi
Yonatan Bisk
Emma Strubell
Jacob Kahn
557
4
0
20 Nov 2024
Diffusion Beats Autoregressive: An Evaluation of Compositional Generation in Text-to-Image Models
Arash Marioriyad
Parham Rezaei
M. Baghshah
M. Rohban
CoGe
469
0
0
30 Oct 2024
How Does Critical Batch Size Scale in Pre-training?
Hanlin Zhang
Depen Morwani
Nikhil Vyas
Jingfeng Wu
Difan Zou
Udaya Ghai
Dean Phillips Foster
Sham Kakade
203
18
0
29 Oct 2024
1
2
3
Next