ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04560
  4. Cited By
Scaling Vision Transformers

Scaling Vision Transformers

8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "Scaling Vision Transformers"

50 / 751 papers shown
Title
SWARM Parallelism: Training Large Models Can Be Surprisingly
  Communication-Efficient
SWARM Parallelism: Training Large Models Can Be Surprisingly Communication-Efficient
Max Ryabinin
Tim Dettmers
Michael Diskin
Alexander Borzunov
MoE
30
31
0
27 Jan 2023
ClimaX: A foundation model for weather and climate
ClimaX: A foundation model for weather and climate
Tung Nguyen
Johannes Brandstetter
Ashish Kapoor
Jayesh K. Gupta
Aditya Grover
AI4Cl
AI4CE
11
245
0
24 Jan 2023
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Masked Autoencoding Does Not Help Natural Language Supervision at Scale
Floris Weers
Vaishaal Shankar
Angelos Katharopoulos
Yinfei Yang
Tom Gunter
CLIP
23
4
0
19 Jan 2023
Human-Timescale Adaptation in an Open-Ended Task Space
Human-Timescale Adaptation in an Open-Ended Task Space
Adaptive Agent Team
Jakob Bauer
Kate Baumli
Satinder Baveja
Feryal M. P. Behbahani
...
Jakub Sygnowski
K. Tuyls
Sarah York
Alexander Zacherl
Lei Zhang
LM&Ro
OffRL
AI4CE
LRM
38
109
0
18 Jan 2023
Enhancing Self-Training Methods
Enhancing Self-Training Methods
Aswathnarayan Radhakrishnan
Jim Davis
Zachary Rabin
Benjamin Lewis
Matthew Scherreik
R. Ilin
26
1
0
18 Jan 2023
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous
  Structured Pruning for Vision Transformer
GOHSP: A Unified Framework of Graph and Optimization-based Heterogeneous Structured Pruning for Vision Transformer
Miao Yin
Burak Uzkent
Yilin Shen
Hongxia Jin
Bo Yuan
ViT
32
13
0
13 Jan 2023
Does progress on ImageNet transfer to real-world datasets?
Does progress on ImageNet transfer to real-world datasets?
Alex Fang
Simon Kornblith
Ludwig Schmidt
VLM
29
34
0
11 Jan 2023
Exploring the Approximation Capabilities of Multiplicative Neural
  Networks for Smooth Functions
Exploring the Approximation Capabilities of Multiplicative Neural Networks for Smooth Functions
Ido Ben-Shaul
Tomer Galanti
S. Dekel
31
3
0
11 Jan 2023
Differentiable modeling to unify machine learning and physical models
  and advance Geosciences
Differentiable modeling to unify machine learning and physical models and advance Geosciences
Chaopeng Shen
A. Appling
Pierre Gentine
Toshiyuki Bandai
H. Gupta
...
Chris Rackauckas
Tirthankar Roy
Chonggang Xu
Binayak Mohanty
K. Lawson
AI4CE
42
14
0
10 Jan 2023
On the Convergence of Stochastic Gradient Descent in Low-precision
  Number Formats
On the Convergence of Stochastic Gradient Descent in Low-precision Number Formats
M. Cacciola
A. Frangioni
M. Asgharian
Alireza Ghaffari
V. Nia
47
4
0
04 Jan 2023
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition
  with Pre-trained Vision-Language Models
Bidirectional Cross-Modal Knowledge Exploration for Video Recognition with Pre-trained Vision-Language Models
Wenhao Wu
Xiaohan Wang
Haipeng Luo
Jingdong Wang
Yi Yang
Wanli Ouyang
100
48
0
31 Dec 2022
Principled and Efficient Transfer Learning of Deep Models via Neural
  Collapse
Principled and Efficient Transfer Learning of Deep Models via Neural Collapse
Xiao Li
Sheng Liu
Jin-li Zhou
Xin Lu
C. Fernandez‐Granda
Zhihui Zhu
Q. Qu
AAML
28
19
0
23 Dec 2022
RangeAugment: Efficient Online Augmentation with Range Learning
RangeAugment: Efficient Online Augmentation with Range Learning
Sachin Mehta
Saeid Naderiparizi
Fartash Faghri
Maxwell Horton
Lailin Chen
Ali Farhadi
Oncel Tuzel
Mohammad Rastegari
26
6
0
20 Dec 2022
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Masked Event Modeling: Self-Supervised Pretraining for Event Cameras
Simone Klenk
David Bonello
Lukas Koestler
Nikita Araslanov
Daniel Cremers
29
23
0
20 Dec 2022
Scalable Diffusion Models with Transformers
Scalable Diffusion Models with Transformers
William S. Peebles
Saining Xie
GNN
40
2,024
0
19 Dec 2022
FlexiViT: One Model for All Patch Sizes
FlexiViT: One Model for All Patch Sizes
Lucas Beyer
Pavel Izmailov
Alexander Kolesnikov
Mathilde Caron
Simon Kornblith
Xiaohua Zhai
Matthias Minderer
Michael Tschannen
Ibrahim M. Alabdulmohsin
Filip Pavetić
VLM
45
90
0
15 Dec 2022
Reproducible scaling laws for contrastive language-image learning
Reproducible scaling laws for contrastive language-image learning
Mehdi Cherti
Romain Beaumont
Ross Wightman
Mitchell Wortsman
Gabriel Ilharco
Cade Gordon
Christoph Schuhmann
Ludwig Schmidt
J. Jitsev
VLM
CLIP
59
743
0
14 Dec 2022
What do Vision Transformers Learn? A Visual Exploration
What do Vision Transformers Learn? A Visual Exploration
Amin Ghiasi
Hamid Kazemi
Eitan Borgnia
Steven Reich
Manli Shu
Micah Goldblum
A. Wilson
Tom Goldstein
ViT
34
60
0
13 Dec 2022
OAMixer: Object-aware Mixing Layer for Vision Transformers
OAMixer: Object-aware Mixing Layer for Vision Transformers
H. Kang
Sangwoo Mo
Jinwoo Shin
VLM
39
4
0
13 Dec 2022
Position: Considerations for Differentially Private Learning with
  Large-Scale Public Pretraining
Position: Considerations for Differentially Private Learning with Large-Scale Public Pretraining
Florian Tramèr
Gautam Kamath
Nicholas Carlini
SILM
49
67
0
13 Dec 2022
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1
  Accuracy with ViT-B and ViT-L on ImageNet
CLIP Itself is a Strong Fine-tuner: Achieving 85.7% and 88.0% Top-1 Accuracy with ViT-B and ViT-L on ImageNet
Xiaoyi Dong
Jianmin Bao
Ting Zhang
Dongdong Chen
Shuyang Gu
Weiming Zhang
Lu Yuan
Dong Chen
Fang Wen
Nenghai Yu
CLIP
22
35
0
12 Dec 2022
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with
  Multi-Source Multimodal Knowledge Memory
REVEAL: Retrieval-Augmented Visual-Language Pre-Training with Multi-Source Multimodal Knowledge Memory
Ziniu Hu
Ahmet Iscen
Chen Sun
Zirui Wang
Kai-Wei Chang
Yizhou Sun
Cordelia Schmid
David A. Ross
Alireza Fathi
RALM
VLM
40
89
0
10 Dec 2022
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Sparse Upcycling: Training Mixture-of-Experts from Dense Checkpoints
Aran Komatsuzaki
J. Puigcerver
James Lee-Thorp
Carlos Riquelme Ruiz
Basil Mustafa
Joshua Ainslie
Yi Tay
Mostafa Dehghani
N. Houlsby
MoMe
MoE
19
109
0
09 Dec 2022
Learning Video Representations from Large Language Models
Learning Video Representations from Large Language Models
Yue Zhao
Ishan Misra
Philipp Krahenbuhl
Rohit Girdhar
VLM
AI4TS
28
167
0
08 Dec 2022
Deep Incubation: Training Large Models by Divide-and-Conquering
Deep Incubation: Training Large Models by Divide-and-Conquering
Zanlin Ni
Yulin Wang
Jiangwei Yu
Haojun Jiang
Yu Cao
Gao Huang
VLM
18
11
0
08 Dec 2022
Pivotal Role of Language Modeling in Recommender Systems: Enriching
  Task-specific and Task-agnostic Representation Learning
Pivotal Role of Language Modeling in Recommender Systems: Enriching Task-specific and Task-agnostic Representation Learning
Kyuyong Shin
Hanock Kwak
Wonjae Kim
Jisu Jeong
Seungjae Jung
KyungHyun Kim
Jung-Woo Ha
Sang-Woo Lee
27
4
0
07 Dec 2022
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video
  Learning
Rethinking Video ViTs: Sparse Video Tubes for Joint Image and Video Learning
A. Piergiovanni
Weicheng Kuo
A. Angelova
ViT
36
54
0
06 Dec 2022
Visual Query Tuning: Towards Effective Usage of Intermediate
  Representations for Parameter and Memory Efficient Transfer Learning
Visual Query Tuning: Towards Effective Usage of Intermediate Representations for Parameter and Memory Efficient Transfer Learning
Cheng-Hao Tu
Zheda Mai
Wei-Lun Chao
32
44
0
06 Dec 2022
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Location-Aware Self-Supervised Transformers for Semantic Segmentation
Mathilde Caron
N. Houlsby
Cordelia Schmid
ViT
24
12
0
05 Dec 2022
ResFormer: Scaling ViTs with Multi-Resolution Training
ResFormer: Scaling ViTs with Multi-Resolution Training
Rui Tian
Zuxuan Wu
Qiuju Dai
Hang-Rui Hu
Yu Qiao
Yu-Gang Jiang
ViT
24
33
0
01 Dec 2022
DSNet: a simple yet efficient network with dual-stream attention for
  lesion segmentation
DSNet: a simple yet efficient network with dual-stream attention for lesion segmentation
Yunxiao Liu
11
0
0
30 Nov 2022
Minimal Width for Universal Property of Deep RNN
Minimal Width for Universal Property of Deep RNN
Changhoon Song
Geonho Hwang
Jun ho Lee
Myung-joo Kang
25
9
0
25 Nov 2022
Differentially Private Image Classification from Features
Differentially Private Image Classification from Features
Harsh Mehta
Walid Krichene
Abhradeep Thakurta
Alexey Kurakin
Ashok Cutkosky
52
7
0
24 Nov 2022
Multi-Environment Pretraining Enables Transfer to Action Limited
  Datasets
Multi-Environment Pretraining Enables Transfer to Action Limited Datasets
David Venuto
Sherry Yang
Pieter Abbeel
Doina Precup
Igor Mordatch
Ofir Nachum
OffRL
25
5
0
23 Nov 2022
Powderworld: A Platform for Understanding Generalization via Rich Task
  Distributions
Powderworld: A Platform for Understanding Generalization via Rich Task Distributions
Kevin Frans
Phillip Isola
OffRL
47
9
0
23 Nov 2022
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Conv2Former: A Simple Transformer-Style ConvNet for Visual Recognition
Qibin Hou
Cheng Lu
Mingg-Ming Cheng
Jiashi Feng
ViT
34
129
0
22 Nov 2022
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and
  Vision-Language Tasks
Uni-Perceiver v2: A Generalist Model for Large-Scale Vision and Vision-Language Tasks
Hao Li
Jinguo Zhu
Xiaohu Jiang
Xizhou Zhu
Hongsheng Li
...
Xiaohua Wang
Yu Qiao
Xiaogang Wang
Wenhai Wang
Jifeng Dai
MLLM
26
55
0
17 Nov 2022
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual
  Information
Towards All-in-one Pre-training via Maximizing Multi-modal Mutual Information
Weijie Su
Xizhou Zhu
Chenxin Tao
Lewei Lu
Bin Li
Gao Huang
Yu Qiao
Xiaogang Wang
Jie Zhou
Jifeng Dai
42
41
0
17 Nov 2022
EfficientTrain: Exploring Generalized Curriculum Learning for Training
  Visual Backbones
EfficientTrain: Exploring Generalized Curriculum Learning for Training Visual Backbones
Yulin Wang
Yang Yue
Rui Lu
Tian-De Liu
Zhaobai Zhong
S. Song
Gao Huang
37
28
0
17 Nov 2022
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
GLAMI-1M: A Multilingual Image-Text Fashion Dataset
Vaclav Kosar
A. Hoskovec
Milan Šulc
Radek Bartyzal
VLM
32
3
0
17 Nov 2022
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Prompt Tuning for Parameter-efficient Medical Image Segmentation
Marc Fischer
Alexander Bartler
Bin Yang
SSeg
24
18
0
16 Nov 2022
Contextual Transformer for Offline Meta Reinforcement Learning
Contextual Transformer for Offline Meta Reinforcement Learning
Runji Lin
Ye Li
Xidong Feng
Zhaowei Zhang
Xian Hong Wu Fung
Haifeng Zhang
Jun Wang
Yali Du
Yaodong Yang
OffRL
26
6
0
15 Nov 2022
EVA: Exploring the Limits of Masked Visual Representation Learning at
  Scale
EVA: Exploring the Limits of Masked Visual Representation Learning at Scale
Yuxin Fang
Wen Wang
Binhui Xie
Quan-Sen Sun
Ledell Yu Wu
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
CLIP
87
679
0
14 Nov 2022
Language models are good pathologists: using attention-based sequence
  reduction and text-pretrained transformers for efficient WSI classification
Language models are good pathologists: using attention-based sequence reduction and text-pretrained transformers for efficient WSI classification
Juan Pisula
Katarzyna Bozek
VLM
MedIm
36
3
0
14 Nov 2022
InternImage: Exploring Large-Scale Vision Foundation Models with
  Deformable Convolutions
InternImage: Exploring Large-Scale Vision Foundation Models with Deformable Convolutions
Wenhai Wang
Jifeng Dai
Zhe Chen
Zhenhang Huang
Zhiqi Li
...
Tong Lu
Lewei Lu
Hongsheng Li
Xiaogang Wang
Yu Qiao
VLM
38
660
0
10 Nov 2022
Harmonizing the object recognition strategies of deep neural networks
  with humans
Harmonizing the object recognition strategies of deep neural networks with humans
Thomas Fel
Ivan Felipe
Drew Linsley
Thomas Serre
36
71
0
08 Nov 2022
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Group DETR v2: Strong Object Detector with Encoder-Decoder Pretraining
Qiang Chen
Jian Wang
Chuchu Han
Shangang Zhang
Zexian Li
...
Haocheng Feng
Kun Yao
Junyu Han
Errui Ding
Jingdong Wang
ViT
VLM
42
45
0
07 Nov 2022
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert
  Denoisers
eDiff-I: Text-to-Image Diffusion Models with an Ensemble of Expert Denoisers
Yogesh Balaji
Seungjun Nah
Xun Huang
Arash Vahdat
Jiaming Song
...
Timo Aila
S. Laine
Bryan Catanzaro
Tero Karras
Xuan Li
VLM
MoE
64
804
0
02 Nov 2022
Exploring Effects of Computational Parameter Changes to Image
  Recognition Systems
Exploring Effects of Computational Parameter Changes to Image Recognition Systems
Nikolaos Louloudakis
Perry Gibson
José Cano
A. Rajan
19
6
0
01 Nov 2022
Broken Neural Scaling Laws
Broken Neural Scaling Laws
Ethan Caballero
Kshitij Gupta
Irina Rish
David M. Krueger
30
74
0
26 Oct 2022
Previous
123...101112...141516
Next