ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2106.04560
  4. Cited By
Scaling Vision Transformers

Scaling Vision Transformers

8 June 2021
Xiaohua Zhai
Alexander Kolesnikov
N. Houlsby
Lucas Beyer
    ViT
ArXivPDFHTML

Papers citing "Scaling Vision Transformers"

50 / 751 papers shown
Title
Transferring Knowledge from Large Foundation Models to Small Downstream
  Models
Transferring Knowledge from Large Foundation Models to Small Downstream Models
Shikai Qiu
Boran Han
Danielle C. Maddix
Shuai Zhang
Yuyang Wang
Andrew Gordon Wilson
38
1
0
11 Jun 2024
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for
  Vision Tasks
MoE Jetpack: From Dense Checkpoints to Adaptive Mixture of Experts for Vision Tasks
Xingkui Zhu
Yiran Guan
Dingkang Liang
Yuchao Chen
Yuliang Liu
Xiang Bai
MoE
40
5
0
07 Jun 2024
Beyond Performance Plateaus: A Comprehensive Study on Scalability in
  Speech Enhancement
Beyond Performance Plateaus: A Comprehensive Study on Scalability in Speech Enhancement
Wangyou Zhang
Kohei Saijo
Jee-weon Jung
Chenda Li
Shinji Watanabe
Yanmin Qian
32
4
0
06 Jun 2024
Convolutional Neural Networks and Vision Transformers for Fashion MNIST
  Classification: A Literature Review
Convolutional Neural Networks and Vision Transformers for Fashion MNIST Classification: A Literature Review
Sonia Bbouzidi
Ghazala Hcini
Imen Jdey
Fadoua Drira
29
4
0
05 Jun 2024
Does your data spark joy? Performance gains from domain upsampling at
  the end of training
Does your data spark joy? Performance gains from domain upsampling at the end of training
Cody Blakeney
Mansheej Paul
Brett W. Larsen
Sean Owen
Jonathan Frankle
29
19
0
05 Jun 2024
LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
LW-DETR: A Transformer Replacement to YOLO for Real-Time Detection
Qiang Chen
Xiangbo Su
Xinyu Zhang
Jian Wang
Jiahui Chen
...
Shan Zhang
Kun Yao
Errui Ding
Gang Zhang
Jingdong Wang
ViT
55
13
0
05 Jun 2024
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Tiny models from tiny data: Textual and null-text inversion for few-shot distillation
Erik Landolsi
Fredrik Kahl
DiffM
58
1
0
05 Jun 2024
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models
Cross-Domain Graph Data Scaling: A Showcase with Diffusion Models
Wenzhuo Tang
Haitao Mao
Danial Dervovic
Ivan Brugere
Saumitra Mishra
Yuying Xie
Jiliang Tang
50
3
0
04 Jun 2024
SAVA: Scalable Learning-Agnostic Data Valuation
SAVA: Scalable Learning-Agnostic Data Valuation
Samuel Kessler
Tam Le
Vu Nguyen
TDI
61
0
0
03 Jun 2024
Scaling White-Box Transformers for Vision
Scaling White-Box Transformers for Vision
Jinrui Yang
Xianhang Li
Druv Pai
Yuyin Zhou
Yi Ma
Yaodong Yu
Cihang Xie
ViT
44
9
0
30 May 2024
Federated and Transfer Learning for Cancer Detection Based on Image
  Analysis
Federated and Transfer Learning for Cancer Detection Based on Image Analysis
Amine Bechar
Y. Elmir
Yassine Himeur
Rafik Medjoudj
Abbes Amira
MedIm
41
4
0
30 May 2024
Learning Robust Correlation with Foundation Model for Weakly-Supervised
  Few-Shot Segmentation
Learning Robust Correlation with Foundation Model for Weakly-Supervised Few-Shot Segmentation
Xinyang Huang
Chuanglu Zhu
Kebin Liu
Ruiying Ren
Shengjie Liu
43
2
0
30 May 2024
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
MLAE: Masked LoRA Experts for Parameter-Efficient Fine-Tuning
Junjie Wang
Guangjing Yang
Wentao Chen
Huahui Yi
Xiaohu Wu
Qicheng Lao
MoE
ALM
44
0
0
29 May 2024
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any
  Resolution
MSPE: Multi-Scale Patch Embedding Prompts Vision Transformers to Any Resolution
Wenzhuo Liu
Fei Zhu
Shijie Ma
Cheng-Lin Liu
30
4
0
28 May 2024
Transformers Can Do Arithmetic with the Right Embeddings
Transformers Can Do Arithmetic with the Right Embeddings
Sean McLeish
Arpit Bansal
Alex Stein
Neel Jain
John Kirchenbauer
...
B. Kailkhura
A. Bhatele
Jonas Geiping
Avi Schwarzschild
Tom Goldstein
53
28
0
27 May 2024
Phase Transitions in the Output Distribution of Large Language Models
Phase Transitions in the Output Distribution of Large Language Models
Julian Arnold
Flemming Holtorf
Frank Schafer
Niels Lörch
41
1
0
27 May 2024
Disentangling and Integrating Relational and Sensory Information in
  Transformer Architectures
Disentangling and Integrating Relational and Sensory Information in Transformer Architectures
Awni Altabaa
John Lafferty
37
3
0
26 May 2024
Amortized Active Causal Induction with Deep Reinforcement Learning
Amortized Active Causal Induction with Deep Reinforcement Learning
Yashas Annadani
P. Tigas
Stefan Bauer
Adam Foster
42
0
0
26 May 2024
Scaling Law for Time Series Forecasting
Scaling Law for Time Series Forecasting
Jingzhe Shi
Qinwei Ma
Huan Ma
Lei Li
AI4TS
31
8
0
24 May 2024
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning
  and Inference
Sparse-Tuning: Adapting Vision Transformers with Efficient Fine-tuning and Inference
Ting Liu
Xuyang Liu
Liangtao Shi
Zunnan Xu
Siteng Huang
Yi Xin
Quanjun Yin
43
5
0
23 May 2024
A Survey on Vision-Language-Action Models for Embodied AI
A Survey on Vision-Language-Action Models for Embodied AI
Yueen Ma
Zixing Song
Yuzheng Zhuang
Jianye Hao
Irwin King
LM&Ro
82
42
0
23 May 2024
No Filter: Cultural and Socioeconomic Diversity in Contrastive
  Vision-Language Models
No Filter: Cultural and Socioeconomic Diversity in Contrastive Vision-Language Models
Angeline Pouget
Lucas Beyer
Emanuele Bugliarello
Xiao Wang
Andreas Steiner
Xiao-Qi Zhai
Ibrahim M. Alabdulmohsin
VLM
33
7
0
22 May 2024
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in
  Large-Scale AI Models
OpenCarbonEval: A Unified Carbon Emission Estimation Framework in Large-Scale AI Models
Zhaojian Yu
Yinghao Wu
Zhuotao Deng
Yansong Tang
Xiao-Ping Zhang
49
2
0
21 May 2024
Octo: An Open-Source Generalist Robot Policy
Octo: An Open-Source Generalist Robot Policy
Octo Model Team
Dibya Ghosh
Homer Walke
Karl Pertsch
Kevin Black
...
Quan Vuong
Ted Xiao
Dorsa Sadigh
Chelsea Finn
Sergey Levine
66
356
0
20 May 2024
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
CLIP with Quality Captions: A Strong Pretraining for Vision Tasks
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Oncel Tuzel
VLM
CLIP
35
6
0
14 May 2024
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual
  Backbone Training
EfficientTrain++: Generalized Curriculum Learning for Efficient Visual Backbone Training
Yulin Wang
Yang Yue
Rui Lu
Yizeng Han
Shiji Song
Gao Huang
VLM
64
12
0
14 May 2024
DEPTH: Discourse Education through Pre-Training Hierarchically
DEPTH: Discourse Education through Pre-Training Hierarchically
Zachary Bamberger
Ofek Glick
Chaim Baskin
Yonatan Belinkov
67
0
0
13 May 2024
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Memory-Space Visual Prompting for Efficient Vision-Language Fine-Tuning
Shibo Jie
Yehui Tang
Ning Ding
Zhi-Hong Deng
Kai Han
Yunhe Wang
VLM
33
6
0
09 May 2024
ChuXin: 1.6B Technical Report
ChuXin: 1.6B Technical Report
Xiaomin Zhuang
Yufan Jiang
Qiaozhi He
Zhihua Wu
ALM
43
0
0
08 May 2024
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
THRONE: An Object-based Hallucination Benchmark for the Free-form Generations of Large Vision-Language Models
Prannay Kaul
Zhizhong Li
Hao Yang
Yonatan Dukler
Ashwin Swaminathan
C. Taylor
Stefano Soatto
HILM
60
16
0
08 May 2024
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and
  Texts
HYPE: Hyperbolic Entailment Filtering for Underspecified Images and Texts
Wonjae Kim
Sanghyuk Chun
Taekyung Kim
Dongyoon Han
Sangdoo Yun
44
7
0
26 Apr 2024
Decentralized Personalized Federated Learning based on a Conditional
  Sparse-to-Sparser Scheme
Decentralized Personalized Federated Learning based on a Conditional Sparse-to-Sparser Scheme
Qianyu Long
Qiyuan Wang
Christos Anagnostopoulos
Daning Bi
FedML
28
0
0
24 Apr 2024
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster
  Pre-training on Web-scale Image-Text Data
CatLIP: CLIP-level Visual Recognition Accuracy with 2.7x Faster Pre-training on Web-scale Image-Text Data
Sachin Mehta
Maxwell Horton
Fartash Faghri
Mohammad Hossein Sekhavat
Mahyar Najibi
Mehrdad Farajtabar
Oncel Tuzel
Mohammad Rastegari
VLM
CLIP
44
6
0
24 Apr 2024
Pretraining Billion-scale Geospatial Foundational Models on Frontier
Pretraining Billion-scale Geospatial Foundational Models on Frontier
A. Tsaris
P. Dias
Abhishek Potnis
Junqi Yin
Feiyi Wang
D. Lunga
AI4CE
38
4
0
17 Apr 2024
Masked Autoencoders for Microscopy are Scalable Learners of Cellular
  Biology
Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology
Oren Z. Kraus
Kian Kenyon-Dean
Saber Saberian
Maryam Fallah
Peter McLean
...
Chi Vicky Cheng
Kristen Morse
Maureen Makes
Ben Mabey
Berton A. Earnshaw
37
26
0
16 Apr 2024
BRAVE: Broadening the visual encoding of vision-language models
BRAVE: Broadening the visual encoding of vision-language models
Ouguzhan Fatih Kar
A. Tonioni
Petra Poklukar
Achin Kulshrestha
Amir Zamir
Federico Tombari
MLLM
VLM
50
25
0
10 Apr 2024
Scaling Laws for Galaxy Images
Scaling Laws for Galaxy Images
Mike Walmsley
Micah Bowles
Anna M. M. Scaife
Jason Shingirai Makechemu
Alexander J. Gordon
...
Chris J. Lintott
K. Mantha
Devina Mohan
David O’Ryan
Inigo V. Slijepevic
23
4
0
03 Apr 2024
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
ViTamin: Designing Scalable Vision Models in the Vision-Language Era
Jienneg Chen
Qihang Yu
Xiaohui Shen
Alan L. Yuille
Liang-Chieh Chen
3DV
VLM
36
24
0
02 Apr 2024
Samba: Semantic Segmentation of Remotely Sensed Images with State Space
  Model
Samba: Semantic Segmentation of Remotely Sensed Images with State Space Model
Qinfeng Zhu
Yuanzhi Cai
Yuan-Sheng Fang
Yihan Yang
Cheng Chen
Lei Fan
Anh Nguyen
Mamba
43
55
0
02 Apr 2024
On Train-Test Class Overlap and Detection for Image Retrieval
On Train-Test Class Overlap and Detection for Image Retrieval
Chull Hwan Song
Jooyoung Yoon
Taebaek Hwang
Shunghyun Choi
Yeong Hyeon Gu
Yannis Avrithis
37
2
0
01 Apr 2024
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation
  Learning for Neural Radiance Fields
NeRF-MAE: Masked AutoEncoders for Self-Supervised 3D Representation Learning for Neural Radiance Fields
Muhammad Zubair Irshad
Sergey Zakahrov
Vitor Campagnolo Guizilini
Adrien Gaidon
Z. Kira
Rares Ambrus
ViT
45
12
0
01 Apr 2024
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action
  Localization
LoSA: Long-Short-range Adapter for Scaling End-to-End Temporal Action Localization
Akshita Gupta
Gaurav Mittal
Ahmed Magooda
Ye Yu
Graham W. Taylor
Mei Chen
51
2
0
01 Apr 2024
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
MagicLens: Self-Supervised Image Retrieval with Open-Ended Instructions
Kai Zhang
Yi Luan
Hexiang Hu
Kenton Lee
Siyuan Qiao
Wenhu Chen
Yu-Chuan Su
Ming-Wei Chang
VLM
LRM
39
34
0
28 Mar 2024
LocCa: Visual Pretraining with Location-aware Captioners
LocCa: Visual Pretraining with Location-aware Captioners
Bo Wan
Michael Tschannen
Yongqin Xian
Filip Pavetić
Ibrahim M. Alabdulmohsin
Xiao Wang
André Susano Pinto
Andreas Steiner
Lucas Beyer
Xiao-Qi Zhai
VLM
51
6
0
28 Mar 2024
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
PlainMamba: Improving Non-Hierarchical Mamba in Visual Recognition
Chenhongyi Yang
Zehui Chen
Miguel Espinosa
Linus Ericsson
Zhenyu Wang
Jiaming Liu
Elliot J. Crowley
Mamba
39
88
0
26 Mar 2024
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
TRAM: Global Trajectory and Motion of 3D Humans from in-the-wild Videos
Yufu Wang
ZiYun Wang
Lingjie Liu
Kostas Daniilidis
48
25
0
26 Mar 2024
Benchmarks and Challenges in Pose Estimation for Egocentric Hand
  Interactions with Objects
Benchmarks and Challenges in Pose Estimation for Egocentric Hand Interactions with Objects
Zicong Fan
Takehiko Ohkawa
Linlin Yang
Nie Lin
Zhishan Zhou
...
Kun He
Yoichi Sato
Otmar Hilliges
Hyung Jin Chang
Angela Yao
49
14
0
25 Mar 2024
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation
  using CLIP and vector quantized diffusion model
CLIP-VQDiffusion : Langauge Free Training of Text To Image generation using CLIP and vector quantized diffusion model
S. Han
Joohee Kim
DiffM
CLIP
34
1
0
22 Mar 2024
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Metric3Dv2: A Versatile Monocular Geometric Foundation Model for Zero-shot Metric Depth and Surface Normal Estimation
Mu Hu
Wei Yin
C. Zhang
Zhipeng Cai
Xiaoxiao Long
Kaixuan Wang
Kaixuan Wang
Gang Yu
Chunhua Shen
Shaojie Shen
3DGS
54
116
0
22 Mar 2024
On Pretraining Data Diversity for Self-Supervised Learning
On Pretraining Data Diversity for Self-Supervised Learning
Hasan Hammoud
Tuhin Das
Fabio Pizzati
Philip H. S. Torr
Adel Bibi
Guohao Li
103
2
0
20 Mar 2024
Previous
12345...141516
Next