ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2212.08059
  4. Cited By
Rethinking Vision Transformers for MobileNet Size and Speed

Rethinking Vision Transformers for MobileNet Size and Speed

15 December 2022
Yanyu Li
Ju Hu
Yang Wen
Georgios Evangelidis
Kamyar Salahi
Yanzhi Wang
Sergey Tulyakov
Jian Ren
    ViT
ArXivPDFHTML

Papers citing "Rethinking Vision Transformers for MobileNet Size and Speed"

50 / 79 papers shown
Title
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
ORXE: Orchestrating Experts for Dynamically Configurable Efficiency
Qingyuan Wang
Guoxin Wang
B. Cardiff
Deepu John
38
0
0
07 May 2025
SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection
SO-DETR: Leveraging Dual-Domain Features and Knowledge Distillation for Small Object Detection
Huaxiang Zhang
Hao Zhang
Aoran Mei
Zhongxue Gan
Guo-Niu Zhu
30
0
0
11 Apr 2025
LSNet: See Large, Focus Small
LSNet: See Large, Focus Small
Ao Wang
Hui Chen
Zijia Lin
J. Han
Guiguang Ding
42
0
0
29 Mar 2025
GmNet: Revisiting Gating Mechanisms From A Frequency View
GmNet: Revisiting Gating Mechanisms From A Frequency View
Yifan Wang
Xu Ma
Yitian Zhang
Zhongruo Wang
Sung-Cheol Kim
Vahid Mirjalili
Vidya Renganathan
Y. Fu
38
0
0
28 Mar 2025
Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition
Fraesormer: Learning Adaptive Sparse Transformer for Efficient Food Recognition
Shun Zou
Yi Zou
Mingya Zhang
Shipeng Luo
Zhihao Chen
Guangwei Gao
ViT
51
0
0
15 Mar 2025
Simpler Fast Vision Transformers with a Jumbo CLS Token
Simpler Fast Vision Transformers with a Jumbo CLS Token
A. Fuller
Yousef Yassin
Daniel G. Kyrollos
Evan Shelhamer
James R. Green
69
0
0
24 Feb 2025
MicroViT: A Vision Transformer with Low Complexity Self Attention for Edge Device
Novendra Setyawan
Chi-Chia Sun
Mao-Hsiu Hsu
W. Kuo
Jun-Wei Hsieh
ViT
49
2
0
09 Feb 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application
iFormer: Integrating ConvNet and Transformer for Mobile Application
Chuanyang Zheng
ViT
72
0
0
26 Jan 2025
RecConv: Efficient Recursive Convolutions for Multi-Frequency
  Representations
RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations
Mingshu Zhao
Yi Luo
Yong Ouyang
38
0
0
27 Dec 2024
Efficient Oriented Object Detection with Enhanced Small Object
  Recognition in Aerial Images
Efficient Oriented Object Detection with Enhanced Small Object Recognition in Aerial Images
Zhifei Shi
Zongyao Yin
Sheng Chang
Xiao Yi
Xianchuan Yu
87
0
0
17 Dec 2024
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
SegMAN: Omni-scale Context Modeling with State Space Models and Local Attention for Semantic Segmentation
Yunxiang Fu
Meng Lou
Yizhou Yu
112
1
0
16 Dec 2024
RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone
RapidNet: Multi-Level Dilated Convolution Based Mobile Backbone
Mustafa Munir
Md Mostafijur Rahman
R. Marculescu
MedIm
ViT
74
0
0
14 Dec 2024
Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature
  Extraction and Interaction with Low-Resolution Images
Cascaded Multi-Scale Attention for Enhanced Multi-Scale Feature Extraction and Interaction with Low-Resolution Images
Xiangyong Lu
Masanori Suganuma
Takayuki Okatani
72
0
0
03 Dec 2024
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Haoyang He
J. Zhang
Yuxuan Cai
Hongxu Chen
Xiaobin Hu
Zhenye Gan
Y. Wang
Chengjie Wang
Yunsheng Wu
Lei Xie
Mamba
88
3
0
24 Nov 2024
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
EfficientViM: Efficient Vision Mamba with Hidden State Mixer based State Space Duality
Sanghyeok Lee
Joonmyung Choi
Hyunwoo J. Kim
110
3
0
22 Nov 2024
AsCAN: Asymmetric Convolution-Attention Networks for Efficient
  Recognition and Generation
AsCAN: Asymmetric Convolution-Attention Networks for Efficient Recognition and Generation
Anil Kag
Huseyin Coskun
Jierun Chen
Junli Cao
Willi Menapace
Aliaksandr Siarohin
Sergey Tulyakov
Jian Ren
46
3
0
07 Nov 2024
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a
  resource-limited Context
PETAH: Parameter Efficient Task Adaptation for Hybrid Transformers in a resource-limited Context
Maximilian Augustin
Syed Shakib Sarwar
Mostafa Elhoushi
Sai Qian Zhang
Yuecheng Li
B. D. Salvo
25
0
0
23 Oct 2024
EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image
  Segmentation on Mobile and Edge Devices
EViT-Unet: U-Net Like Efficient Vision Transformer for Medical Image Segmentation on Mobile and Edge Devices
Xin Li
Wenhui Zhu
Xuanzhao Dong
Oana Dumitrascu
Yalin Wang
ViT
MedIm
28
0
0
19 Oct 2024
On Efficient Variants of Segment Anything Model: A Survey
On Efficient Variants of Segment Anything Model: A Survey
Xiaorui Sun
J. Liu
H. Shen
Xiaofeng Zhu
Ping Hu
VLM
45
4
0
07 Oct 2024
Attention Down-Sampling Transformer, Relative Ranking and
  Self-Consistency for Blind Image Quality Assessment
Attention Down-Sampling Transformer, Relative Ranking and Self-Consistency for Blind Image Quality Assessment
Mohammed Alsaafin
Musab Alsheikh
Saeed Anwar
Muhammad Usman
ViT
25
1
0
11 Sep 2024
MpoxMamba: A Grouped Mamba-based Lightweight Hybrid Network for Mpox
  Detection
MpoxMamba: A Grouped Mamba-based Lightweight Hybrid Network for Mpox Detection
Yubiao Yue
Jun Xue
Haihuang Liang
Zhenzhang Li
Yufeng Wang
Mamba
33
0
0
06 Sep 2024
LowFormer: Hardware Efficient Design for Convolutional Transformer
  Backbones
LowFormer: Hardware Efficient Design for Convolutional Transformer Backbones
Moritz Nottebaum
Matteo Dunnhofer
C. Micheloni
ViT
31
1
0
05 Sep 2024
SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge
  Devices via Hardware-Aware Evolutionary Search
SCAN-Edge: Finding MobileNet-speed Hybrid Networks for Diverse Edge Devices via Hardware-Aware Evolutionary Search
Hung-Yueh Chiang
Diana Marculescu
29
0
0
27 Aug 2024
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for
  Efficient Mobile Applications
CAS-ViT: Convolutional Additive Self-attention Vision Transformers for Efficient Mobile Applications
Tianfang Zhang
Lei Li
Yang Zhou
Wentao Liu
Chen Qian
Xiangyang Ji
ViT
28
11
0
07 Aug 2024
Many Perception Tasks are Highly Redundant Functions of their Input Data
Many Perception Tasks are Highly Redundant Functions of their Input Data
Rahul Ramesh
Anthony Bisulco
Ronald W. DiTullio
Linran Wei
Vijay Balasubramanian
Kostas Daniilidis
Pratik Chaudhari
38
2
0
18 Jul 2024
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an
  Efficient Alternative to Attention in ViTs
AFIDAF: Alternating Fourier and Image Domain Adaptive Filters as an Efficient Alternative to Attention in ViTs
Yunling Zheng
Zeyi Xu
Fanghui Xue
Biao Yang
Jiancheng Lyu
Shuai Zhang
Y. Qi
Jack Xin
48
0
0
16 Jul 2024
Scalp Diagnostic System With Label-Free Segmentation and Training-Free
  Image Translation
Scalp Diagnostic System With Label-Free Segmentation and Training-Free Image Translation
Youngmin Kim
Saejin Kim
Hoyeon Moon
Youngjae Yu
Junhyug Noh
MedIm
32
0
0
25 Jun 2024
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
Mingshu Zhao
Yi Luo
Yong Ouyang
32
2
0
23 Jun 2024
Scaling Graph Convolutions for Mobile Vision
Scaling Graph Convolutions for Mobile Vision
William Avery
Mustafa Munir
R. Marculescu
GNN
34
4
0
09 Jun 2024
Mamba YOLO: SSMs-Based YOLO For Object Detection
Mamba YOLO: SSMs-Based YOLO For Object Detection
Zeyu Wang
Chen Li
Huiying Xu
Xinzhong Zhu
Mamba
47
13
0
09 Jun 2024
Navigating Efficiency in MobileViT through Gaussian Process on Global
  Architecture Factors
Navigating Efficiency in MobileViT through Gaussian Process on Global Architecture Factors
Ke Meng
Kai Chen
27
0
0
07 Jun 2024
The 3D-PC: a benchmark for visual perspective taking in humans and machines
The 3D-PC: a benchmark for visual perspective taking in humans and machines
Drew Linsley
Peisen Zhou
A. Ashok
Akash Nagaraj
Gaurav Gaonkar
Francis E Lewis
Zygmunt Pizlo
Thomas Serre
48
6
0
06 Jun 2024
Efficient Multimodal Large Language Models: A Survey
Efficient Multimodal Large Language Models: A Survey
Yizhang Jin
Jian Li
Yexin Liu
Tianjun Gu
Kai Wu
...
Xin Tan
Zhenye Gan
Yabiao Wang
Chengjie Wang
Lizhuang Ma
LRM
41
45
0
17 May 2024
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for
  Remote Sensing Image Interpretation
LeMeViT: Efficient Vision Transformer with Learnable Meta Tokens for Remote Sensing Image Interpretation
Wentao Jiang
Jing Zhang
Di Wang
Qiming Zhang
Zengmao Wang
Bo Du
34
5
0
16 May 2024
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
GreedyViG: Dynamic Axial Graph Construction for Efficient Vision GNNs
Mustafa Munir
William Avery
Md Mostafijur Rahman
R. Marculescu
GNN
53
12
0
10 May 2024
Context-Guided Spatial Feature Reconstruction for Efficient Semantic
  Segmentation
Context-Guided Spatial Feature Reconstruction for Efficient Semantic Segmentation
Zhenliang Ni
Xinghao Chen
Yingjie Zhai
Yehui Tang
Yunhe Wang
39
15
0
10 May 2024
An Experimental Study on Exploring Strong Lightweight Vision
  Transformers via Masked Image Modeling Pre-Training
An Experimental Study on Exploring Strong Lightweight Vision Transformers via Masked Image Modeling Pre-Training
Jin Gao
Shubo Lin
Shaoru Wang
Yutong Kou
Zeming Li
Liang Li
Congxuan Zhang
Xiaoqin Zhang
Yizheng Wang
Weiming Hu
39
1
0
18 Apr 2024
MobileNetV4 - Universal Models for the Mobile Ecosystem
MobileNetV4 - Universal Models for the Mobile Ecosystem
Danfeng Qin
Chas Leichner
M. Delakis
Marco Fornoni
Shixin Luo
...
Berkin Akin
Vaibhav Aggarwal
Tenghui Zhu
Daniele Moro
Andrew G. Howard
MQ
28
85
0
16 Apr 2024
Efficient Modulation for Vision Networks
Efficient Modulation for Vision Networks
Xu Ma
Xiyang Dai
Jianwei Yang
Bin Xiao
Yinpeng Chen
Yun Fu
Lu Yuan
40
17
0
29 Mar 2024
Tiny Models are the Computational Saver for Large Models
Tiny Models are the Computational Saver for Large Models
Qingyuan Wang
B. Cardiff
Antoine Frappé
Benoît Larras
Deepu John
29
2
0
26 Mar 2024
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
PEM: Prototype-based Efficient MaskFormer for Image Segmentation
Niccolò Cavagnero
Gabriele Rosi
Claudia Cuttano
Francesca Pistilli
Marco Ciccone
Giuseppe Averta
Fabio Cermelli
43
21
0
29 Feb 2024
A SAM-guided Two-stream Lightweight Model for Anomaly Detection
A SAM-guided Two-stream Lightweight Model for Anomaly Detection
Chenghao Li
Lei Qi
Xin Geng
27
4
0
29 Feb 2024
A Comprehensive Survey of Convolutions in Deep Learning: Applications,
  Challenges, and Future Trends
A Comprehensive Survey of Convolutions in Deep Learning: Applications, Challenges, and Future Trends
Abolfazl Younesi
Mohsen Ansari
Mohammadamin Fazli
A. Ejlali
Muhammad Shafique
Joerg Henkel
3DV
44
44
0
23 Feb 2024
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
SHViT: Single-Head Vision Transformer with Memory Efficient Macro Design
Seokju Yun
Youngmin Ro
ViT
38
29
0
29 Jan 2024
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
UPDP: A Unified Progressive Depth Pruner for CNN and Vision Transformer
Ji Liu
Dehua Tang
Yuanxian Huang
Li Lyna Zhang
Xiaocheng Zeng
...
Jinzhang Peng
Yu-Chiang Frank Wang
Fan Jiang
Lu Tian
Ashish Sirasao
ViT
24
7
0
12 Jan 2024
Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework
  on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous
  Modalities
Achelous++: Power-Oriented Water-Surface Panoptic Perception Framework on Edge Devices based on Vision-Radar Fusion and Pruning of Heterogeneous Modalities
Runwei Guan
Haocheng Zhao
Shanliang Yao
Ka Lok Man
Xiaohui Zhu
...
Yong Yue
Jeremy S. Smith
Eng Gee Lim
Weiping Ding
Yutao Yue
17
4
0
14 Dec 2023
Building Variable-sized Models via Learngene Pool
Building Variable-sized Models via Learngene Pool
Boyu Shi
Shiyu Xia
Xu Yang
Haokun Chen
Zhi Kou
Xin Geng
15
1
0
10 Dec 2023
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced
  Training
MobileCLIP: Fast Image-Text Models through Multi-Modal Reinforced Training
Pavan Kumar Anasosalu Vasu
Hadi Pouransari
Fartash Faghri
Raviteja Vemulapalli
Oncel Tuzel
CLIP
VLM
29
43
0
28 Nov 2023
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
TransNeXt: Robust Foveal Visual Perception for Vision Transformers
Dai Shi
ViT
13
76
0
28 Nov 2023
FMViT: A multiple-frequency mixing Vision Transformer
FMViT: A multiple-frequency mixing Vision Transformer
Wei Tan
Yifeng Geng
Xuansong Xie
ViT
19
3
0
09 Nov 2023
12
Next