ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04560
  4. Cited By
Scaling Vision Transformers

Scaling Vision Transformers

8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "Scaling Vision Transformers"

50 / 751 papers shown
Title
DeltaNN: Assessing the Impact of Computational Environment Parameters on
  the Performance of Image Recognition Models
DeltaNN: Assessing the Impact of Computational Environment Parameters on the Performance of Image Recognition Models
Nikolaos Louloudakis
Perry Gibson
José Cano
A. Rajan
17
8
0
05 Jun 2023
Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
Towards Sustainable Learning: Coresets for Data-efficient Deep Learning
Yu Yang
Hao Kang
Baharan Mirzasoleiman
41
34
0
02 Jun 2023
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Hiera: A Hierarchical Vision Transformer without the Bells-and-Whistles
Chaitanya K. Ryali
Yuan-Ting Hu
Daniel Bolya
Chen Wei
Haoqi Fan
...
Omid Poursaeed
Judy Hoffman
Jitendra Malik
Yanghao Li
Christoph Feichtenhofer
3DH
45
160
0
01 Jun 2023
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
MetaDiffuser: Diffusion Model as Conditional Planner for Offline Meta-RL
Fei Ni
Jianye Hao
Yao Mu
Yifu Yuan
Yan Zheng
Bin Wang
Zhixuan Liang
DiffM
OffRL
64
42
0
31 May 2023
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented
  Language Model Prompting
LMCap: Few-shot Multilingual Image Captioning by Retrieval Augmented Language Model Prompting
R. Ramos
Bruno Martins
Desmond Elliott
VLM
13
16
0
31 May 2023
AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
AMatFormer: Efficient Feature Matching via Anchor Matching Transformer
Bo Jiang
S. Luo
Tianlin Li
Chuanfu Li
Jin Tang
38
8
0
30 May 2023
Analyzing the Sample Complexity of Self-Supervised Image Reconstruction
  Methods
Analyzing the Sample Complexity of Self-Supervised Image Reconstruction Methods
Tobit Klug
Dogukan Atik
Reinhard Heckel
39
8
0
30 May 2023
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention
  Graph in Pre-Trained Transformers
Zero-TPrune: Zero-Shot Token Pruning through Leveraging of the Attention Graph in Pre-Trained Transformers
Hongjie Wang
Bhishma Dedhia
N. Jha
ViT
VLM
46
26
0
27 May 2023
Image Quality Is Not All You Want: Task-Driven Lens Design for Image
  Classification
Image Quality Is Not All You Want: Task-Driven Lens Design for Image Classification
Xinge Yang
Qiang Fu
Yunfeng Nie
Wolfgang Heidrich
VLM
29
7
0
26 May 2023
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Three Towers: Flexible Contrastive Learning with Pretrained Image Models
Jannik Kossen
Mark Collier
Basil Mustafa
Tianlin Li
Xiaohua Zhai
Lucas Beyer
Andreas Steiner
Jesse Berent
Rodolphe Jenatton
Efi Kokiopoulou
VLM
45
11
0
26 May 2023
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from
  Small Scale to Large Scale
VanillaKD: Revisit the Power of Vanilla Knowledge Distillation from Small Scale to Large Scale
Zhiwei Hao
Jianyuan Guo
Kai Han
Han Hu
Chang Xu
Yunhe Wang
38
16
0
25 May 2023
Delving Deeper into Data Scaling in Masked Image Modeling
Delving Deeper into Data Scaling in Masked Image Modeling
Cheng Lu
Xiaojie Jin
Qibin Hou
Jun Hao Liew
Mingg-Ming Cheng
Jiashi Feng
38
4
0
24 May 2023
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Getting ViT in Shape: Scaling Laws for Compute-Optimal Model Design
Ibrahim M. Alabdulmohsin
Xiaohua Zhai
Alexander Kolesnikov
Lucas Beyer
VLM
42
58
0
22 May 2023
VanillaNet: the Power of Minimalism in Deep Learning
VanillaNet: the Power of Minimalism in Deep Learning
Hanting Chen
Yunhe Wang
Jianyuan Guo
Dacheng Tao
VLM
34
85
0
22 May 2023
From Patches to Objects: Exploiting Spatial Reasoning for Better Visual
  Representations
From Patches to Objects: Exploiting Spatial Reasoning for Better Visual Representations
Toni Albert
Bjoern M. Eskofier
Dario Zanca
SSL
14
0
0
21 May 2023
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Multimodal Web Navigation with Instruction-Finetuned Foundation Models
Hiroki Furuta
Kuang-Huei Lee
Ofir Nachum
Yutaka Matsuo
Aleksandra Faust
S. Gu
Izzeddin Gur
LM&Ro
36
93
0
19 May 2023
PointGPT: Auto-regressively Generative Pre-training from Point Clouds
PointGPT: Auto-regressively Generative Pre-training from Point Clouds
Guang-Sheng Chen
Meiling Wang
Yi Yang
Kai Yu
Li-xin Yuan
Yufeng Yue
3DPC
24
80
0
19 May 2023
ONE-PEACE: Exploring One General Representation Model Toward Unlimited
  Modalities
ONE-PEACE: Exploring One General Representation Model Toward Unlimited Modalities
Peng Wang
Shijie Wang
Junyang Lin
Shuai Bai
Xiaohuan Zhou
Jingren Zhou
Xinggang Wang
Chang Zhou
VLM
MLLM
ObjD
48
115
0
18 May 2023
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
A Comprehensive Survey on Segment Anything Model for Vision and Beyond
Chunhui Zhang
Li Liu
Yawen Cui
Guanjie Huang
Weilin Lin
Yiqian Yang
Yuehong Hu
VLM
43
90
0
14 May 2023
An Inverse Scaling Law for CLIP Training
An Inverse Scaling Law for CLIP Training
Xianhang Li
Zeyu Wang
Cihang Xie
VLM
CLIP
48
55
0
11 May 2023
Alternating Gradient Descent and Mixture-of-Experts for Integrated
  Multimodal Perception
Alternating Gradient Descent and Mixture-of-Experts for Integrated Multimodal Perception
Hassan Akbari
Dan Kondratyuk
Huayu Chen
Rachel Hornung
Haoran Wang
Hartwig Adam
VLM
MoE
30
11
0
10 May 2023
Finding Meaningful Distributions of ML Black-boxes under Forensic
  Investigation
Finding Meaningful Distributions of ML Black-boxes under Forensic Investigation
Jiyi Zhang
Hansheng Fang
Hwee Kuan Lee
E. Chang
18
1
0
10 May 2023
Less is More: Removing Text-regions Improves CLIP Training Efficiency
  and Robustness
Less is More: Removing Text-regions Improves CLIP Training Efficiency and Robustness
Liangliang Cao
Bowen Zhang
Chen Chen
Yinfei Yang
Xianzhi Du
Wen‐Cheng Zhang
Zhiyun Lu
Yantao Zheng
CLIP
VLM
27
15
0
08 May 2023
X-LLM: Bootstrapping Advanced Large Language Models by Treating
  Multi-Modalities as Foreign Languages
X-LLM: Bootstrapping Advanced Large Language Models by Treating Multi-Modalities as Foreign Languages
Feilong Chen
Minglun Han
Haozhi Zhao
Qingyang Zhang
Jing Shi
Shuang Xu
Bo Xu
MLLM
46
115
0
07 May 2023
ZipIt! Merging Models from Different Tasks without Training
ZipIt! Merging Models from Different Tasks without Training
George Stoica
Daniel Bolya
J. Bjorner
Pratik Ramesh
Taylor N. Hearn
Judy Hoffman
VLM
MoMe
52
111
0
04 May 2023
DPSeq: A Novel and Efficient Digital Pathology Classifier for Predicting
  Cancer Biomarkers using Sequencer Architecture
DPSeq: A Novel and Efficient Digital Pathology Classifier for Predicting Cancer Biomarkers using Sequencer Architecture
M. Cen
Xingyu Li
Bangwei Guo
J. Jonnagaddala
Hong Zhang
Xuesong Xu
MedIm
22
0
0
03 May 2023
Modality-invariant Visual Odometry for Embodied Vision
Modality-invariant Visual Odometry for Embodied Vision
Marius Memmel
Roman Bachmann
Amir Zamir
54
8
0
29 Apr 2023
Are Emergent Abilities of Large Language Models a Mirage?
Are Emergent Abilities of Large Language Models a Mirage?
Rylan Schaeffer
Brando Miranda
Oluwasanmi Koyejo
LRM
50
396
0
28 Apr 2023
Vision Conformer: Incorporating Convolutions into Vision Transformer
  Layers
Vision Conformer: Incorporating Convolutions into Vision Transformer Layers
Brian Kenji Iwana
Akihiro Kusuda
ViT
45
2
0
27 Apr 2023
A Strong and Reproducible Object Detector with Only Public Datasets
A Strong and Reproducible Object Detector with Only Public Datasets
Tianhe Ren
Jianwei Yang
Siyi Liu
Ailing Zeng
Feng Li
Hao Zhang
Hongyang Li
Zhaoyang Zeng
Lei Zhang
ObjD
41
11
0
25 Apr 2023
Stable and low-precision training for large-scale vision-language models
Stable and low-precision training for large-scale vision-language models
Mitchell Wortsman
Tim Dettmers
Luke Zettlemoyer
Ari S. Morcos
Ali Farhadi
Ludwig Schmidt
MQ
MLLM
VLM
24
39
0
25 Apr 2023
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards
  Boosted Few-Shot Parameter-Efficient Tuning
Hint-Aug: Drawing Hints from Foundation Vision Transformers Towards Boosted Few-Shot Parameter-Efficient Tuning
Zhongzhi Yu
Shang Wu
Y. Fu
Shunyao Zhang
Yingyan Lin
33
6
0
25 Apr 2023
Distilling from Similar Tasks for Transfer Learning on a Budget
Distilling from Similar Tasks for Transfer Learning on a Budget
Kenneth Borup
Cheng Perng Phoo
Bharath Hariharan
30
2
0
24 Apr 2023
A Cookbook of Self-Supervised Learning
A Cookbook of Self-Supervised Learning
Randall Balestriero
Mark Ibrahim
Vlad Sobal
Ari S. Morcos
Shashank Shekhar
...
Pierre Fernandez
Amir Bar
Hamed Pirsiavash
Yann LeCun
Micah Goldblum
SyDa
FedML
SSL
50
274
0
24 Apr 2023
Synthetic Data from Diffusion Models Improves ImageNet Classification
Synthetic Data from Diffusion Models Improves ImageNet Classification
Shekoofeh Azizi
Simon Kornblith
Chitwan Saharia
Mohammad Norouzi
David J. Fleet
VLM
DiffM
40
292
0
17 Apr 2023
A Randomized Approach for Tight Privacy Accounting
A Randomized Approach for Tight Privacy Accounting
Jiachen T. Wang
Saeed Mahloujifar
Tong Wu
R. Jia
Prateek Mittal
36
9
0
17 Apr 2023
DINOv2: Learning Robust Visual Features without Supervision
DINOv2: Learning Robust Visual Features without Supervision
Maxime Oquab
Timothée Darcet
Théo Moutakanni
Huy Q. Vo
Marc Szafraniec
...
Hervé Jégou
Julien Mairal
Patrick Labatut
Armand Joulin
Piotr Bojanowski
VLM
CLIP
SSL
131
3,055
0
14 Apr 2023
On the Opportunities and Challenges of Foundation Models for Geospatial
  Artificial Intelligence
On the Opportunities and Challenges of Foundation Models for Geospatial Artificial Intelligence
Gengchen Mai
Weiming Huang
Jin Sun
Suhang Song
Deepak Mishra
...
Yingjie Hu
Chris Cundy
Ziyuan Li
Rui Zhu
Ni Lao
AI4CE
35
123
0
13 Apr 2023
STU-Net: Scalable and Transferable Medical Image Segmentation Models
  Empowered by Large-Scale Supervised Pre-training
STU-Net: Scalable and Transferable Medical Image Segmentation Models Empowered by Large-Scale Supervised Pre-training
Ziyan Huang
Hao Wang
Zhongying Deng
Jin Ye
Yanzhou Su
...
Junjun He
Yun Gu
Lixu Gu
Shaoting Zhang
Yu Qiao
21
74
0
13 Apr 2023
Unicom: Universal and Compact Representation Learning for Image
  Retrieval
Unicom: Universal and Compact Representation Learning for Image Retrieval
Xiang An
Jiankang Deng
Kaicheng Yang
Jaiwei Li
Ziyong Feng
Jia Guo
Jing Yang
Tongliang Liu
VLM
SSL
39
26
0
12 Apr 2023
A Billion-scale Foundation Model for Remote Sensing Images
A Billion-scale Foundation Model for Remote Sensing Images
Keumgang Cha
Junghoon Seo
Taekyung Lee
35
64
0
11 Apr 2023
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Improving Image Recognition by Retrieving from Web-Scale Image-Text Data
Ahmet Iscen
Alireza Fathi
Cordelia Schmid
VLM
3DV
33
25
0
11 Apr 2023
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
SATR: Zero-Shot Semantic Segmentation of 3D Shapes
Ahmed Abdelreheem
Ivan Skorokhodov
M. Ovsjanikov
Peter Wonka
3DPC
35
38
0
11 Apr 2023
On Efficient Training of Large-Scale Deep Learning Models: A Literature
  Review
On Efficient Training of Large-Scale Deep Learning Models: A Literature Review
Li Shen
Yan Sun
Zhiyuan Yu
Liang Ding
Xinmei Tian
Dacheng Tao
VLM
30
41
0
07 Apr 2023
Visual Dependency Transformers: Dependency Tree Emerges from Reversed
  Attention
Visual Dependency Transformers: Dependency Tree Emerges from Reversed Attention
Mingyu Ding
Songlin Yang
Lijie Fan
Zhenfang Chen
Z. Chen
Ping Luo
J. Tenenbaum
Chuang Gan
ViT
84
14
0
06 Apr 2023
Training Strategies for Vision Transformers for Object Detection
Training Strategies for Vision Transformers for Object Detection
Apoorv Singh
31
4
0
05 Apr 2023
Effective Theory of Transformers at Initialization
Effective Theory of Transformers at Initialization
Emily Dinan
Sho Yaida
Susan Zhang
30
14
0
04 Apr 2023
Vision-Language Models for Vision Tasks: A Survey
Vision-Language Models for Vision Tasks: A Survey
Jingyi Zhang
Jiaxing Huang
Sheng Jin
Shijian Lu
VLM
43
486
0
03 Apr 2023
Where are we in the search for an Artificial Visual Cortex for Embodied
  Intelligence?
Where are we in the search for an Artificial Visual Cortex for Embodied Intelligence?
Arjun Majumdar
Karmesh Yadav
Sergio Arnaud
Yecheng Jason Ma
Claire Chen
...
Dhruv Batra
Yixin Lin
Oleksandr Maksymets
Aravind Rajeswaran
Franziska Meier
LM&Ro
27
173
0
31 Mar 2023
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
A Study of Autoregressive Decoders for Multi-Tasking in Computer Vision
Lucas Beyer
Bo Wan
Gagan Madan
Filip Pavetić
Andreas Steiner
...
Emanuele Bugliarello
Tianlin Li
Qihang Yu
Liang-Chieh Chen
Xiaohua Zhai
57
8
0
30 Mar 2023
Previous
123...8910...141516
Next