Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2209.15159
Cited By
MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features
30 September 2022
S. Wadekar
Abhishek Chaurasia
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MobileViTv3: Mobile-Friendly Vision Transformer with Simple and Effective Fusion of Local, Global and Input Features"
32 / 32 papers shown
Title
RecConv: Efficient Recursive Convolutions for Multi-Frequency Representations
Mingshu Zhao
Yi Luo
Yong Ouyang
38
0
0
27 Dec 2024
CompactFlowNet: Efficient Real-time Optical Flow Estimation on Mobile Devices
Andrei Znobishchev
Valerii Filev
Oleg Kudashev
Nikita Orlov
Humphrey Shi
77
0
0
17 Dec 2024
LUIEO: A Lightweight Model for Integrating Underwater Image Enhancement and Object Detection
Bin Li
Li Li
Zhenwei Zhang
Yuping Duan
80
0
0
01 Dec 2024
Improving Accuracy and Generalization for Efficient Visual Tracking
Ram J. Zaveri
Shivang Patel
Yu Gu
Gianfranco Doretto
VLM
86
0
0
28 Nov 2024
Lightweight Gaze Estimation Model Via Fusion Global Information
Zhang Cheng
Yanxia Wang
98
0
0
27 Nov 2024
Scaling Spike-driven Transformer with Efficient Spike Firing Approximation Training
Man Yao
Xuerui Qiu
Tianxiang Hu
J. Hu
Yuhong Chou
Keyu Tian
Jianxing Liao
Luziwei Leng
Bo Xu
Guoqi Li
76
7
0
25 Nov 2024
MobileMamba: Lightweight Multi-Receptive Visual Mamba Network
Haoyang He
Jun Zhang
Yuxuan Cai
Hongxu Chen
Xiaobin Hu
Zhenye Gan
Yishuo Wang
Chengjie Wang
Yunsheng Wu
Lei Xie
Mamba
88
3
0
24 Nov 2024
COSNet: A Novel Semantic Segmentation Network using Enhanced Boundaries in Cluttered Scenes
Muhammad Ali
Mamoona Javaid
Mubashir Noman
M. Fiaz
Salman Khan
36
0
0
31 Oct 2024
Improving Vision Transformers by Overlapping Heads in Multi-Head Self-Attention
Tianxiao Zhang
Bo Luo
G. Wang
ViT
21
1
0
18 Oct 2024
RepNeXt: A Fast Multi-Scale CNN using Structural Reparameterization
Mingshu Zhao
Yi Luo
Yong Ouyang
40
2
0
23 Jun 2024
LeYOLO, New Scalable and Efficient CNN Architecture for Object Detection
Lilian Hollard
Lucas Mohimont
N. Gaveau
L. Steffenel
ObjD
42
3
0
20 Jun 2024
Two Heads are Better Than One: Neural Networks Quantization with 2D Hilbert Curve-based Output Representation
Mykhail M. Uss
Ruslan Yermolenko
Olena Kolodiazhna
Oleksii Shashko
Ivan Safonov
Volodymyr Savin
Yoonjae Yeo
Seowon Ji
Jaeyun Jeong
MQ
27
0
0
22 May 2024
MambaOut: Do We Really Need Mamba for Vision?
Weihao Yu
Xinchao Wang
Mamba
50
48
0
13 May 2024
EffLoc: Lightweight Vision Transformer for Efficient 6-DOF Camera Relocalization
Zhendong Xiao
Changhao Chen
Shan Yang
Wu Wei
35
1
0
21 Feb 2024
A Lightweight Feature Fusion Architecture For Resource-Constrained Crowd Counting
Yashwardhan Chaudhuri
Ankit Kumar
Orchid Chetia Phukan
Arun Balaji Buduru
32
1
0
11 Jan 2024
MCAD: Multi-teacher Cross-modal Alignment Distillation for efficient image-text retrieval
Youbo Lei
Feifei He
Chen Chen
Yingbin Mo
Sijia Li
Defeng Xie
H. Lu
VLM
59
0
0
30 Oct 2023
RingMo-lite: A Remote Sensing Multi-task Lightweight Network with CNN-Transformer Hybrid Framework
Yuelei Wang
Ting Zhang
Liangjin Zhao
Lin Hu
Zhechao Wang
...
Kaiqiang Chen
Xuan Zeng
Zhirui Wang
Hongqi Wang
Xian Sun
24
4
0
16 Sep 2023
RepViT: Revisiting Mobile CNN From ViT Perspective
Ao Wang
Hui Chen
Zijia Lin
Hengjun Pu
Guiguang Ding
34
178
0
18 Jul 2023
Light-Weight Vision Transformer with Parallel Local and Global Self-Attention
Nikolas Ebert
Laurenz Reichardt
D. Stricker
Oliver Wasenmüller
ViT
16
2
0
18 Jul 2023
FR-Net:A Light-weight FFT Residual Net For Gaze Estimation
Tao Xu
Borimandafu Wu
Ruilong Fan
Yun Zhou
Di Huang
35
2
0
04 May 2023
EVA-02: A Visual Representation for Neon Genesis
Yuxin Fang
Quan-Sen Sun
Xinggang Wang
Tiejun Huang
Xinlong Wang
Yue Cao
VLM
ViT
CLIP
40
259
0
20 Mar 2023
ElasticViT: Conflict-aware Supernet Training for Deploying Fast Vision Transformer on Diverse Mobile Devices
Chen Tang
Li Zhang
Huiqiang Jiang
Jiahang Xu
Ting Cao
Quanlu Zhang
Yuqing Yang
Zhi Wang
Mao Yang
28
11
0
17 Mar 2023
Run, Don't Walk: Chasing Higher FLOPS for Faster Neural Networks
Jierun Chen
Shiu-hong Kao
Hao He
Weipeng Zhuo
Song Wen
Chul-Ho Lee
Shueng-Han Gary Chan
OOD
35
782
0
07 Mar 2023
TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models
Sucheng Ren
Fangyun Wei
Zheng-Wei Zhang
Han Hu
40
34
0
03 Jan 2023
Rethinking Mobile Block for Efficient Attention-based Models
Jiangning Zhang
Xiangtai Li
Jian Li
Liang Liu
Zhucun Xue
Boshen Zhang
Zhe Jiang
Tianxin Huang
Yabiao Wang
Chengjie Wang
MQ
44
90
0
03 Jan 2023
MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer
Sachin Mehta
Mohammad Rastegari
ViT
218
1,213
0
05 Oct 2021
Mobile-Former: Bridging MobileNet and Transformer
Yinpeng Chen
Xiyang Dai
Dongdong Chen
Mengchen Liu
Xiaoyi Dong
Lu Yuan
Zicheng Liu
ViT
183
476
0
12 Aug 2021
CMT: Convolutional Neural Networks Meet Vision Transformers
Jianyuan Guo
Kai Han
Han Wu
Yehui Tang
Chunjing Xu
Yunhe Wang
Chang Xu
ViT
351
633
0
13 Jul 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
313
3,625
0
24 Feb 2021
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
L. V. D. van der Maaten
Kilian Q. Weinberger
PINN
3DV
315
36,381
0
25 Aug 2016
Semantic Understanding of Scenes through the ADE20K Dataset
Bolei Zhou
Hang Zhao
Xavier Puig
Tete Xiao
Sanja Fidler
Adela Barriuso
Antonio Torralba
SSeg
253
1,829
0
18 Aug 2016
ImageNet Large Scale Visual Recognition Challenge
Olga Russakovsky
Jia Deng
Hao Su
J. Krause
S. Satheesh
...
A. Karpathy
A. Khosla
Michael S. Bernstein
Alexander C. Berg
Li Fei-Fei
VLM
ObjD
296
39,217
0
01 Sep 2014
1