ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2302.05442
  4. Cited By
Scaling Vision Transformers to 22 Billion Parameters

Scaling Vision Transformers to 22 Billion Parameters

10 February 2023
Mostafa Dehghani
Josip Djolonga
Basil Mustafa
Piotr Padlewski
Jonathan Heek
Justin Gilmer
Andreas Steiner
Mathilde Caron
Robert Geirhos
Ibrahim M. Alabdulmohsin
Rodolphe Jenatton
Lucas Beyer
Michael Tschannen
Anurag Arnab
Tianlin Li
C. Riquelme
Matthias Minderer
J. Puigcerver
Utku Evci
Manoj Kumar
Sjoerd van Steenkiste
Gamaleldin F. Elsayed
Aravindh Mahendran
Feng Yu
Avital Oliver
Fantine Huot
Jasmijn Bastings
Mark Collier
A. Gritsenko
Vighnesh Birodkar
C. N. Vasconcelos
Yi Tay
Thomas Mensink
Alexander Kolesnikov
Filip Pavetić
Dustin Tran
Thomas Kipf
Mario Luvcić
Xiaohua Zhai
Daniel Keysers
Jeremiah Harmsen
N. Houlsby
    MLLM
ArXivPDFHTML

Papers citing "Scaling Vision Transformers to 22 Billion Parameters"

50 / 418 papers shown
Title
NVLM: Open Frontier-Class Multimodal LLMs
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLM
VLM
LRM
45
54
0
17 Sep 2024
SOAP: Improving and Stabilizing Shampoo using Adam
SOAP: Improving and Stabilizing Shampoo using Adam
Nikhil Vyas
Depen Morwani
Rosie Zhao
Itai Shapira
David Brandfonbrener
Lucas Janson
Sham Kakade
Sham Kakade
75
24
0
17 Sep 2024
Self-Masking Networks for Unsupervised Adaptation
Self-Masking Networks for Unsupervised Adaptation
Alfonso Taboada Warmerdam
Mathilde Caron
Yuki M. Asano
49
1
0
11 Sep 2024
How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular
  Retrieval
How Molecules Impact Cells: Unlocking Contrastive PhenoMolecular Retrieval
Philip Fradkin
Puria Azadi
Karush Suri
Frederik Wenkel
A. Bashashati
Maciej Sypetkowski
Dominique Beaini
33
1
0
10 Sep 2024
The AdEMAMix Optimizer: Better, Faster, Older
The AdEMAMix Optimizer: Better, Faster, Older
Matteo Pagliardini
Pierre Ablin
David Grangier
ODL
28
8
0
05 Sep 2024
EMP: Enhance Memory in Data Pruning
EMP: Enhance Memory in Data Pruning
Jinying Xiao
Ping Li
Jie Nie
Zhe Tang
VLM
57
0
0
28 Aug 2024
A Statistical Framework for Data-dependent Retrieval-Augmented Models
A Statistical Framework for Data-dependent Retrieval-Augmented Models
Soumya Basu
A. S. Rawat
Manzil Zaheer
RALM
49
0
0
27 Aug 2024
Sapiens: Foundation for Human Vision Models
Sapiens: Foundation for Human Vision Models
Rawal Khirodkar
Timur M. Bagautdinov
Julieta Martinez
Su Zhaoen
Austin James
Peter Selednik
Stuart Anderson
Shunsuke Saito
VLM
43
63
0
22 Aug 2024
Show-o: One Single Transformer to Unify Multimodal Understanding and
  Generation
Show-o: One Single Transformer to Unify Multimodal Understanding and Generation
Jinheng Xie
Weijia Mao
Zechen Bai
David Junhao Zhang
Weihao Wang
Kevin Qinghong Lin
Yuchao Gu
Zhijie Chen
Zhenheng Yang
Mike Zheng Shou
57
165
0
22 Aug 2024
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal
  models
D-RMGPT: Robot-assisted collaborative tasks driven by large multimodal models
Matteo Forlini
Mihail Babcinschi
Giacomo Palmieri
Pedro Neto
52
1
0
21 Aug 2024
Zero-Shot Object-Centric Representation Learning
Zero-Shot Object-Centric Representation Learning
Aniket Didolkar
Andrii Zadaianchuk
Anirudh Goyal
Mike Mozer
Yoshua Bengio
Georg Martius
Maximilian Seitzer
VLM
OCL
37
4
0
17 Aug 2024
Towards flexible perception with visual memory
Towards flexible perception with visual memory
Robert Geirhos
P. Jaini
Austin Stone
Sourabh Medapati
Xi Yi
G. Toderici
Abhijit Ogale
Jonathon Shlens
42
1
0
15 Aug 2024
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in
  Underperformed Scenes
Multi-scale Contrastive Adaptor Learning for Segmenting Anything in Underperformed Scenes
Ke Zhou
Zhongwei Qiu
Dongmei Fu
VLM
43
1
0
12 Aug 2024
Diffusion Guided Language Modeling
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
44
6
0
08 Aug 2024
Body of Her: A Preliminary Study on End-to-End Humanoid Agent
Body of Her: A Preliminary Study on End-to-End Humanoid Agent
Tenglong Ao
LM&Ro
31
1
0
06 Aug 2024
Evaluating Posterior Probabilities: Decision Theory, Proper Scoring
  Rules, and Calibration
Evaluating Posterior Probabilities: Decision Theory, Proper Scoring Rules, and Calibration
Luciana Ferrer
Daniel Ramos
UQCV
33
4
0
05 Aug 2024
Virchow2: Scaling Self-Supervised Mixed Magnification Models in
  Pathology
Virchow2: Scaling Self-Supervised Mixed Magnification Models in Pathology
Eric Zimmermann
Eugene Vorontsov
Julian Viret
Adam Casson
Michal Zelechowski
...
Razik Yousfi
Thomas J. Fuchs
Nicolò Fusi
Siqi Liu
Kristen Severson
MedIm
41
30
0
01 Aug 2024
Scaling Backwards: Minimal Synthetic Pre-training?
Scaling Backwards: Minimal Synthetic Pre-training?
Ryo Nakamura
Ryu Tadokoro
Ryosuke Yamada
Tim Puhlfürß
Iro Laina
Christian Rupprecht
Walid Maalej
Rio Yokota
Hirokatsu Kataoka
DD
22
2
0
01 Aug 2024
Are Bigger Encoders Always Better in Vision Large Models?
Are Bigger Encoders Always Better in Vision Large Models?
Bozhou Li
Hao Liang
Zimo Meng
Wentao Zhang
VLM
40
3
0
01 Aug 2024
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Mixture of Nested Experts: Adaptive Processing of Visual Tokens
Gagan Jain
Nidhi Hegde
Aditya Kusupati
Arsha Nagrani
Shyamal Buch
Prateek Jain
Anurag Arnab
Sujoy Paul
MoE
48
7
0
29 Jul 2024
u-$\mu$P: The Unit-Scaled Maximal Update Parametrization
u-μ\muμP: The Unit-Scaled Maximal Update Parametrization
Charlie Blake
C. Eichenberg
Josef Dean
Lukas Balles
Luke Y. Prince
Bjorn Deiseroth
Andres Felipe Cruz Salinas
Carlo Luschi
Samuel Weinbach
Douglas Orr
61
9
0
24 Jul 2024
Stretching Each Dollar: Diffusion Training from Scratch on a
  Micro-Budget
Stretching Each Dollar: Diffusion Training from Scratch on a Micro-Budget
Vikash Sehwag
Xianghao Kong
Jingtao Li
Michael Spranger
Lingjuan Lyu
DiffM
47
9
0
22 Jul 2024
SIGMA:Sinkhorn-Guided Masked Video Modeling
SIGMA:Sinkhorn-Guided Masked Video Modeling
Mohammadreza Salehi
Michael Dorkenwald
Fida Mohammad Thoker
E. Gavves
Cees G. M. Snoek
Yuki M. Asano
55
3
0
22 Jul 2024
dMel: Speech Tokenization made Simple
dMel: Speech Tokenization made Simple
Richard He Bai
Tatiana Likhomanenko
Ruixiang Zhang
Zijin Gu
Zakaria Aldeneh
Navdeep Jaitly
43
4
0
22 Jul 2024
Foundation Models for Autonomous Robots in Unstructured Environments
Foundation Models for Autonomous Robots in Unstructured Environments
Hossein Naderi
Alireza Shojaei
Lifu Huang
LM&Ro
47
0
0
19 Jul 2024
Scaling Sign Language Translation
Scaling Sign Language Translation
Biao Zhang
Garrett Tanzer
Orhan Firat
LRM
VLM
SLR
40
1
0
16 Jul 2024
Deconstructing What Makes a Good Optimizer for Language Models
Deconstructing What Makes a Good Optimizer for Language Models
Rosie Zhao
Depen Morwani
David Brandfonbrener
Nikhil Vyas
Sham Kakade
50
17
0
10 Jul 2024
Controlling Space and Time with Diffusion Models
Controlling Space and Time with Diffusion Models
Daniel Watson
Saurabh Saxena
Lala Li
Andrea Tagliasacchi
David J. Fleet
VGen
71
27
0
10 Jul 2024
Precision at Scale: Domain-Specific Datasets On-Demand
Precision at Scale: Domain-Specific Datasets On-Demand
Jesús M. Rodríguez-de-Vera
Imanol G. Estepa
Ignacio Sarasúa
Bhalaji Nagarajan
Petia Radeva
40
2
0
03 Jul 2024
On the Performance and Memory Footprint of Distributed Training: An
  Empirical Study on Transformers
On the Performance and Memory Footprint of Distributed Training: An Empirical Study on Transformers
Zhengxian Lu
Fangyu Wang
Zhiwei Xu
Fei Yang
Tao Li
37
1
0
02 Jul 2024
Overcoming Common Flaws in the Evaluation of Selective Classification
  Systems
Overcoming Common Flaws in the Evaluation of Selective Classification Systems
Jeremias Traub
Till J. Bungert
Carsten T. Lüth
Michael Baumgartner
Klaus H. Maier-Hein
Lena Maier-Hein
Paul F. Jaeger
44
3
0
01 Jul 2024
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Resolving Discrepancies in Compute-Optimal Scaling of Language Models
Tomer Porian
Mitchell Wortsman
J. Jitsev
Ludwig Schmidt
Y. Carmon
60
20
0
27 Jun 2024
Transformer Normalisation Layers and the Independence of Semantic
  Subspaces
Transformer Normalisation Layers and the Independence of Semantic Subspaces
S. Menary
Samuel Kaski
Andre Freitas
44
2
0
25 Jun 2024
EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned
  Data for Evaluating Text-to-Image Models
EVALALIGN: Supervised Fine-Tuning Multimodal LLMs with Human-Aligned Data for Evaluating Text-to-Image Models
Zhiyu Tan
Xiaomeng Yang
Luozheng Qin
Mengping Yang
Cheng Zhang
Hao Li
50
7
0
24 Jun 2024
Uni-Mol2: Exploring Molecular Pretraining Model at Scale
Uni-Mol2: Exploring Molecular Pretraining Model at Scale
Xiaohong Ji
Zhen Wang
Zhifeng Gao
Hang Zheng
Linfeng Zhang
Guolin Ke
Weinan E
AI4CE
50
6
0
21 Jun 2024
Is AI fun? HumorDB: a curated dataset and benchmark to investigate
  graphical humor
Is AI fun? HumorDB: a curated dataset and benchmark to investigate graphical humor
Veedant Jain
Felipe dos Santos Alves Feitosa
Gabriel Kreiman
VLM
45
2
0
19 Jun 2024
Unveiling Encoder-Free Vision-Language Models
Unveiling Encoder-Free Vision-Language Models
Haiwen Diao
Yufeng Cui
Xiaotong Li
Yueze Wang
Huchuan Lu
Xinlong Wang
VLM
59
29
0
17 Jun 2024
Exploring the Role of Large Language Models in Prompt Encoding for
  Diffusion Models
Exploring the Role of Large Language Models in Prompt Encoding for Diffusion Models
Bingqi Ma
Zhuofan Zong
Guanglu Song
Hongsheng Li
Yu Liu
38
21
0
17 Jun 2024
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for
  Low-Resource Languages with Automated Crawling, Transcription and Refinement
GigaSpeech 2: An Evolving, Large-Scale and Multi-domain ASR Corpus for Low-Resource Languages with Automated Crawling, Transcription and Refinement
Yifan Yang
Zheshu Song
Jianheng Zhuo
Mingyu Cui
Jinpeng Li
...
Shuai Fan
Kai Yu
Wei-Qiang Zhang
Guoguo Chen
Xie Chen
35
8
0
17 Jun 2024
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
LRM-Zero: Training Large Reconstruction Models with Synthesized Data
Desai Xie
Sai Bi
Zhixin Shu
Kai Zhang
Zexiang Xu
Yi Zhou
Soren Pirk
Arie E. Kaufman
Xin Sun
Hao Tan
SyDa
56
14
0
13 Jun 2024
ToSA: Token Selective Attention for Efficient Vision Transformers
ToSA: Token Selective Attention for Efficient Vision Transformers
Manish Kumar Singh
R. Yasarla
Hong Cai
Mingu Lee
Fatih Porikli
62
0
0
13 Jun 2024
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision
  Transformer
AdaNCA: Neural Cellular Automata As Adaptors For More Robust Vision Transformer
Yitao Xu
Tong Zhang
Sabine Süsstrunk
ViT
47
0
0
12 Jun 2024
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel
  Fusion
FLUX: Fast Software-based Communication Overlap On GPUs Through Kernel Fusion
Li-Wen Chang
Yiyuan Ma
Qi Hou
Chengquan Jiang
Ningxin Zheng
...
Zuquan Song
Ziheng Jiang
Yanghua Peng
Xuanzhe Liu
Xin Liu
41
22
0
11 Jun 2024
Adapters Strike Back
Adapters Strike Back
Jan-Martin O. Steitz
Stefan Roth
35
5
0
10 Jun 2024
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor
  Control
Adapting Pretrained ViTs with Convolution Injector for Visuo-Motor Control
Dongyoon Hwang
ByungKun Lee
Hojoon Lee
Hyunseung Kim
Jaegul Choo
53
0
0
10 Jun 2024
ProMotion: Prototypes As Motion Learners
ProMotion: Prototypes As Motion Learners
Yawen Lu
Dongfang Liu
Qifan Wang
Cheng Han
Yiming Cui
Zhiwen Cao
Xueling Zhang
Yingjie Victor Chen
Heng Fan
DiffM
46
2
0
07 Jun 2024
Labeled Data Selection for Category Discovery
Labeled Data Selection for Category Discovery
Bingchen Zhao
Nico Lang
Serge Belongie
Oisin Mac Aodha
34
3
0
07 Jun 2024
Interpretable Lightweight Transformer via Unrolling of Learned Graph
  Smoothness Priors
Interpretable Lightweight Transformer via Unrolling of Learned Graph Smoothness Priors
Tam Thuc Do
Parham Eftekhar
Seyed Alireza Hosseini
Gene Cheung
Philip A. Chou
31
0
0
06 Jun 2024
Enhancing 2D Representation Learning with a 3D Prior
Enhancing 2D Representation Learning with a 3D Prior
Mehmet Aygun
Prithviraj Dhar
Zhicheng Yan
Oisin Mac Aodha
Rakesh Ranjan
SSL
61
1
0
04 Jun 2024
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits
  Multimodal Reasoning
Retrieval Meets Reasoning: Even High-school Textbook Knowledge Benefits Multimodal Reasoning
Cheng Tan
Jingxuan Wei
Linzhuang Sun
Zhangyang Gao
Siyuan Li
Bihui Yu
Ruifeng Guo
Stan Z. Li
ReLM
LRM
3DV
72
6
0
31 May 2024
Previous
123456789
Next