Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2106.04560
Cited By
Scaling Vision Transformers
8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Scaling Vision Transformers"
50 / 751 papers shown
Title
Tell, don't show: Declarative facts influence how LLMs generalize
Alexander Meinke
Owain Evans
21
7
0
12 Dec 2023
CLIP in Medical Imaging: A Comprehensive Survey
Zihao Zhao
Yuxiao Liu
Han Wu
Yonghao Li
Sheng Wang
L. Teng
Disheng Liu
Zhiming Cui
Qian Wang
Dinggang Shen
CLIP
MedIm
LM&MA
VLM
28
2
0
12 Dec 2023
Photorealistic Video Generation with Diffusion Models
Agrim Gupta
Lijun Yu
Kihyuk Sohn
Xiuye Gu
Meera Hahn
Fei-Fei Li
Irfan Essa
Lu Jiang
José Lezama
VGen
57
174
0
11 Dec 2023
Bad Students Make Great Teachers: Active Learning Accelerates Large-Scale Visual Understanding
Talfan Evans
Shreya Pathak
Hamza Merzic
Jonathan Schwarz
Ryutaro Tanno
Olivier J. Hénaff
18
16
0
08 Dec 2023
Scaling Laws of Synthetic Images for Model Training ... for Now
Lijie Fan
Kaifeng Chen
Dilip Krishnan
Dina Katabi
Phillip Isola
Yonglong Tian
CLIP
VLM
41
61
0
07 Dec 2023
GenTron: Diffusion Transformers for Image and Video Generation
Shoufa Chen
Mengmeng Xu
Jiawei Ren
Yuren Cong
Sen He
Yanping Xie
Animesh Sinha
Ping Luo
Tao Xiang
Juan-Manuel Perez-Rua
VGen
36
38
0
07 Dec 2023
Transformer-Powered Surrogates Close the ICF Simulation-Experiment Gap with Extremely Limited Data
M. Olson
Shusen Liu
Jayaraman J. Thiagarajan
B. Kustowski
Weng-Keen Wong
Rushil Anirudh
AI4CE
30
1
0
06 Dec 2023
MoSA: Mixture of Sparse Adapters for Visual Efficient Tuning
Qizhe Zhang
Bocheng Zou
Ruichuan An
Jiaming Liu
Shanghang Zhang
MoE
27
2
0
05 Dec 2023
SEVA: Leveraging sketches to evaluate alignment between human and machine visual abstraction
Kushin Mukherjee
Holly Huey
Xuanchen Lu
Yael Vinker
Rio Aguina-Kang
Ariel Shamir
Judith E. Fan
27
11
0
05 Dec 2023
Rejuvenating image-GPT as Strong Visual Representation Learners
Sucheng Ren
Zeyu Wang
Hongru Zhu
Junfei Xiao
Alan L. Yuille
Cihang Xie
VLM
57
7
0
04 Dec 2023
GIVT: Generative Infinite-Vocabulary Transformers
Michael Tschannen
Cian Eastwood
Fabian Mentzer
31
34
0
04 Dec 2023
Bootstrapping SparseFormers from Vision Foundation Models
Ziteng Gao
Zhan Tong
K. Lin
Joya Chen
Mike Zheng Shou
38
0
0
04 Dec 2023
InstructTA: Instruction-Tuned Targeted Attack for Large Vision-Language Models
Xunguang Wang
Zhenlan Ji
Pingchuan Ma
Zongjie Li
Shuai Wang
MLLM
43
11
0
04 Dec 2023
Improve Supervised Representation Learning with Masked Image Modeling
Kaifeng Chen
Daniel M. Salz
Huiwen Chang
Kihyuk Sohn
Dilip Krishnan
Mojtaba Seyedhosseini
SSL
ViT
39
3
0
01 Dec 2023
Brainformer: Mimic Human Visual Brain Functions to Machine Vision Models via fMRI
Xuan-Bac Nguyen
Xin Li
Pawan Sinha
Samee U. Khan
Khoa Luu
ViT
MedIm
29
0
0
30 Nov 2023
Leveraging VLM-Based Pipelines to Annotate 3D Objects
Rishabh Kabra
Loic Matthey
Alexander Lerchner
Niloy J. Mitra
24
6
0
29 Nov 2023
Side4Video: Spatial-Temporal Side Network for Memory-Efficient Image-to-Video Transfer Learning
Huanjin Yao
Wenhao Wu
Zhiheng Li
VLM
95
9
0
27 Nov 2023
Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding
Hoang-Quan Nguyen
Thanh-Dat Truong
Xuan-Bac Nguyen
Ashley Dowling
Xin Li
Khoa Luu
VLM
24
19
0
26 Nov 2023
Zero redundancy distributed learning with differential privacy
Zhiqi Bu
Justin Chiu
Ruixuan Liu
Sheng Zha
George Karypis
45
8
0
20 Nov 2023
Security Fence Inspection at Airports Using Object Detection
Nils Friederich
Andreas Specker
Jürgen Beyerer
17
2
0
18 Nov 2023
Challenges in data-based geospatial modeling for environmental research and practice
Diana Koldasbayeva
P. Tregubova
M. Gasanov
Alexey Zaytsev
Anna Petrovskaia
E. Burnaev
AI4CE
32
1
0
18 Nov 2023
Deep Tensor Network
Yifan Zhang
29
0
0
18 Nov 2023
Florence-2: Advancing a Unified Representation for a Variety of Vision Tasks
Bin Xiao
Haiping Wu
Weijian Xu
Xiyang Dai
Houdong Hu
Yumao Lu
Michael Zeng
Ce Liu
Lu Yuan
VLM
45
143
0
10 Nov 2023
PolyMaX: General Dense Prediction with Mask Transformer
Xuan S. Yang
Liangzhe Yuan
Kimberly Wilber
Astuti Sharma
Xiuye Gu
...
Stephanie Debats
Huisheng Wang
Hartwig Adam
Mikhail Sirotenko
Liang-Chieh Chen
28
14
0
09 Nov 2023
Window Attention is Bugged: How not to Interpolate Position Embeddings
Daniel Bolya
Chaitanya K. Ryali
Judy Hoffman
Christoph Feichtenhofer
43
10
0
09 Nov 2023
Univariate Radial Basis Function Layers: Brain-inspired Deep Neural Layers for Low-Dimensional Inputs
Daniel Jost
Basavasagar Patil
Xavier Alameda-Pineda
Chris Reinke
16
0
0
07 Nov 2023
FLORA: Fine-grained Low-Rank Architecture Search for Vision Transformer
Chi-Chih Chang
Yuan-Yao Sung
Shixing Yu
N. Huang
Diana Marculescu
Kai-Chiang Wu
ViT
28
1
0
07 Nov 2023
Navigating Scaling Laws: Compute Optimality in Adaptive Model Training
Sotiris Anagnostidis
Gregor Bachmann
Imanol Schlag
Thomas Hofmann
33
2
0
06 Nov 2023
GTP-ViT: Efficient Vision Transformers via Graph-based Token Propagation
Xuwei Xu
Sen Wang
Yudong Chen
Yanping Zheng
Zhewei Wei
Jiajun Liu
ViT
27
8
0
06 Nov 2023
Deep Double Descent for Time Series Forecasting: Avoiding Undertrained Models
Valentino Assandri
Sam Heshmati
Burhaneddin Yaman
Anton Iakovlev
Ariel Emiliano Repetur
45
0
0
02 Nov 2023
AiluRus: A Scalable ViT Framework for Dense Prediction
Jin Li
Yaoming Wang
Xiaopeng Zhang
Bowen Shi
Dongsheng Jiang
Chenglin Li
Wenrui Dai
Hongkai Xiong
Qi Tian
59
5
0
02 Nov 2023
Generating QM1B with PySCF
IPU
_{\text{IPU}}
IPU
Alexander Mathiasen
Hatem Helal
Kerstin Klaser
Paul Balanca
Josef Dean
Carlo Luschi
Dominique Beaini
Andrew Fitzgibbon
Dominic Masters
25
1
0
02 Nov 2023
Text Rendering Strategies for Pixel Language Models
Jonas F. Lotz
Elizabeth Salesky
Phillip Rust
Desmond Elliott
VLM
29
11
0
01 Nov 2023
Res-Tuning: A Flexible and Efficient Tuning Paradigm via Unbinding Tuner from Backbone
Zeyinzi Jiang
Chaojie Mao
Ziyuan Huang
Ao Ma
Yiliang Lv
Yujun Shen
Deli Zhao
Jingren Zhou
30
15
0
30 Oct 2023
Analyzing Vision Transformers for Image Classification in Class Embedding Space
Martina G. Vilas
Timothy Schaumlöffel
Gemma Roig
ViT
21
23
0
29 Oct 2023
Multi-scale Diffusion Denoised Smoothing
Jongheon Jeong
Jinwoo Shin
DiffM
26
8
0
25 Oct 2023
ConvNets Match Vision Transformers at Scale
Samuel L. Smith
Andrew Brock
Leonard Berrada
Soham De
13
23
0
25 Oct 2023
Gramian Attention Heads are Strong yet Efficient Vision Learners
Jongbin Ryu
Dongyoon Han
J. Lim
32
1
0
25 Oct 2023
Pre-Training on Large-Scale Generated Docking Conformations with HelixDock to Unlock the Potential of Protein-ligand Structure Prediction Models
Lihang Liu
Shanzhuo Zhang
Donglong He
Xianbin Ye
Jingbo Zhou
...
Fan Wang
Jingzhou He
Liang Zheng
Yonghui Li
Xiaomin Fang
AI4CE
22
8
0
21 Oct 2023
SILC: Improving Vision Language Pretraining with Self-Distillation
Muhammad Ferjad Naeem
Yongqin Xian
Xiaohua Zhai
Lukas Hoyer
Luc Van Gool
F. Tombari
VLM
26
33
0
20 Oct 2023
KAKURENBO: Adaptively Hiding Samples in Deep Neural Network Training
Truong Thao Nguyen
Balazs Gerofi
Edgar Josafat Martinez-Noriega
Franccois Trahay
M. Wahib
32
1
0
16 Oct 2023
Farzi Data: Autoregressive Data Distillation
Noveen Sachdeva
Zexue He
Wang-Cheng Kang
Jianmo Ni
D. Cheng
Julian McAuley
DD
21
3
0
15 Oct 2023
An Unbiased Look at Datasets for Visuo-Motor Pre-Training
Sudeep Dasari
Mohan Kumar Srirama
Unnat Jain
Abhinav Gupta
SSL
34
34
0
13 Oct 2023
PaLI-3 Vision Language Models: Smaller, Faster, Stronger
Xi Chen
Xiao Wang
Lucas Beyer
Alexander Kolesnikov
Jialin Wu
...
Keran Rong
Tianli Yu
Daniel Keysers
Xiao-Qi Zhai
Radu Soricut
MLLM
VLM
38
94
0
13 Oct 2023
Efficient Adaptation of Large Vision Transformer via Adapter Re-Composing
Wei Dong
Dawei Yan
Zhijun Lin
Peng Wang
19
21
0
10 Oct 2023
Learning Interactive Real-World Simulators
Mengjiao Yang
Yilun Du
Kamyar Ghasemipour
Jonathan Tompson
Leslie Kaelbling
Dale Schuurmans
Pieter Abbeel
LM&Ro
PINN
30
180
0
09 Oct 2023
Transformer Fusion with Optimal Transport
Moritz Imfeld
Jacopo Graldi
Marco Giordano
Thomas Hofmann
Sotiris Anagnostidis
Sidak Pal Singh
ViT
MoMe
29
16
0
09 Oct 2023
No Token Left Behind: Efficient Vision Transformer via Dynamic Token Idling
Xuwei Xu
Changlin Li
Yudong Chen
Xiaojun Chang
Jiajun Liu
Sen Wang
ViT
21
5
0
09 Oct 2023
Plug n' Play: Channel Shuffle Module for Enhancing Tiny Vision Transformers
Xuwei Xu
Sen Wang
Yudong Chen
Jiajun Liu
ViT
21
1
0
09 Oct 2023
Module-wise Adaptive Distillation for Multimodality Foundation Models
Chen Liang
Jiahui Yu
Ming-Hsuan Yang
Matthew A. Brown
Huayu Chen
Tuo Zhao
Boqing Gong
Tianyi Zhou
11
10
0
06 Oct 2023
Previous
1
2
3
...
5
6
7
...
14
15
16
Next