Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.11010
Cited By
MPViT: Multi-Path Vision Transformer for Dense Prediction
21 December 2021
Youngwan Lee
Jonghee Kim
Jeffrey Willette
Sung Ju Hwang
ViT
Re-assign community
ArXiv
PDF
HTML
Papers citing
"MPViT: Multi-Path Vision Transformer for Dense Prediction"
50 / 67 papers shown
Title
Breaking the Low-Rank Dilemma of Linear Attention
Qihang Fan
Huaibo Huang
Ran He
89
1
0
12 Nov 2024
Depth Estimation Based on 3D Gaussian Splatting Siamese Defocus
Jinchang Zhang
Ningning Xu
Hao Zhang
Guoyu Lu
MDE
58
0
0
18 Sep 2024
A Novel Spike Transformer Network for Depth Estimation from Event Cameras via Cross-modality Knowledge Distillation
Xin Zhang
Liangxiu Han
Tam Sobeih
Lianghao Han
Darren Dancey
160
2
0
26 Apr 2024
TransXNet: Learning Both Global and Local Dynamics with a Dual Dynamic Token Mixer for Visual Recognition
Meng Lou
Hong-Yu Zhou
Sibei Yang
Yizhou Yu
Chuan Wu
Yizhou Yu
ViT
78
40
0
30 Oct 2023
Lightweight Vision Transformer with Bidirectional Interaction
Qihang Fan
Huaibo Huang
Xiaoqiang Zhou
Ran He
ViT
103
29
0
01 Jun 2023
HRFormer: High-Resolution Transformer for Dense Prediction
Yuhui Yuan
Rao Fu
Lang Huang
Weihong Lin
Chao Zhang
Xilin Chen
Jingdong Wang
ViT
73
233
0
18 Oct 2021
Focal Self-attention for Local-Global Interactions in Vision Transformers
Jianwei Yang
Chunyuan Li
Pengchuan Zhang
Xiyang Dai
Bin Xiao
Lu Yuan
Jianfeng Gao
ViT
75
435
0
01 Jul 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
89
1,661
0
25 Jun 2021
XCiT: Cross-Covariance Image Transformers
Alaaeldin El-Nouby
Hugo Touvron
Mathilde Caron
Piotr Bojanowski
Matthijs Douze
...
Ivan Laptev
Natalia Neverova
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
ViT
134
512
0
17 Jun 2021
ViTAE: Vision Transformer Advanced by Exploring Intrinsic Inductive Bias
Yufei Xu
Qiming Zhang
Jing Zhang
Dacheng Tao
ViT
138
338
0
07 Jun 2021
Glance-and-Gaze Vision Transformer
Qihang Yu
Yingda Xia
Yutong Bai
Yongyi Lu
Alan Yuille
Wei Shen
ViT
52
76
0
04 Jun 2021
SegFormer: Simple and Efficient Design for Semantic Segmentation with Transformers
Enze Xie
Wenhai Wang
Zhiding Yu
Anima Anandkumar
J. Álvarez
Ping Luo
ViT
256
5,017
0
31 May 2021
Are Convolutional Neural Networks or Transformers more like human vision?
Shikhar Tuli
Ishita Dasgupta
Erin Grant
Thomas Griffiths
ViT
FaML
54
185
0
15 May 2021
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin K. Wei
Huaxia Xia
Chunhua Shen
ViT
79
1,017
0
28 Apr 2021
Visformer: The Vision-friendly Transformer
Zhengsu Chen
Lingxi Xie
Jianwei Niu
Xuefeng Liu
Longhui Wei
Qi Tian
ViT
161
221
0
26 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
127
1,257
0
22 Apr 2021
Co-Scale Conv-Attentional Image Transformers
Weijian Xu
Yifan Xu
Tyler A. Chang
Zhuowen Tu
ViT
52
375
0
13 Apr 2021
LeViT: a Vision Transformer in ConvNet's Clothing for Faster Inference
Ben Graham
Alaaeldin El-Nouby
Hugo Touvron
Pierre Stock
Armand Joulin
Hervé Jégou
Matthijs Douze
ViT
64
788
0
02 Apr 2021
Going deeper with Image Transformers
Hugo Touvron
Matthieu Cord
Alexandre Sablayrolles
Gabriel Synnaeve
Hervé Jégou
ViT
133
1,014
0
31 Mar 2021
CvT: Introducing Convolutions to Vision Transformers
Haiping Wu
Bin Xiao
Noel Codella
Mengchen Liu
Xiyang Dai
Lu Yuan
Lei Zhang
ViT
136
1,907
0
29 Mar 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
209
2,147
0
29 Mar 2021
Transformer Tracking
Xin Chen
Bin Yan
Jiawen Zhu
Dong Wang
Xiaoyun Yang
Huchuan Lu
ViT
59
957
0
29 Mar 2021
Multi-Scale Vision Longformer: A New Vision Transformer for High-Resolution Image Encoding
Pengchuan Zhang
Xiyang Dai
Jianwei Yang
Bin Xiao
Lu Yuan
Lei Zhang
Jianfeng Gao
ViT
76
335
0
29 Mar 2021
CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification
Chun-Fu Chen
Quanfu Fan
Yikang Shen
ViT
68
1,477
0
27 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
425
21,347
0
25 Mar 2021
Transformer Meets Tracker: Exploiting Temporal Context for Robust Visual Tracking
Ning Wang
Wen-gang Zhou
Jie Wang
Houqiang Li
ViT
69
532
0
22 Mar 2021
Fast and Accurate Model Scaling
Piotr Dollár
Mannat Singh
Ross B. Girshick
58
98
0
11 Mar 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
377
1,561
0
27 Feb 2021
Pyramid Vision Transformer: A Versatile Backbone for Dense Prediction without Convolutions
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
513
3,709
0
24 Feb 2021
Is Space-Time Attention All You Need for Video Understanding?
Gedas Bertasius
Heng Wang
Lorenzo Torresani
ViT
365
2,045
0
09 Feb 2021
TrackFormer: Multi-Object Tracking with Transformers
Tim Meinhardt
A. Kirillov
Laura Leal-Taixe
Christoph Feichtenhofer
VOT
263
764
0
07 Jan 2021
Training data-efficient image transformers & distillation through attention
Hugo Touvron
Matthieu Cord
Matthijs Douze
Francisco Massa
Alexandre Sablayrolles
Hervé Jégou
ViT
365
6,757
0
23 Dec 2020
MaX-DeepLab: End-to-End Panoptic Segmentation with Mask Transformers
Huiyu Wang
Yukun Zhu
Hartwig Adam
Alan Yuille
Liang-Chieh Chen
ViT
101
530
0
01 Dec 2020
Sparse R-CNN: End-to-End Object Detection with Learnable Proposals
Pei Sun
Rufeng Zhang
Yi Jiang
Tao Kong
Chenfeng Xu
...
Masayoshi Tomizuka
Lei Li
Zehuan Yuan
Changhu Wang
Ping Luo
ObjD
93
1,094
0
25 Nov 2020
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
559
40,961
0
22 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
200
5,068
0
08 Oct 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
716
41,894
0
28 May 2020
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
365
13,025
0
26 May 2020
On Translation Invariance in CNNs: Convolutional Layers can Exploit Absolute Spatial Location
O. Kayhan
Jan van Gemert
288
236
0
16 Mar 2020
How Much Position Information Do Convolutional Neural Networks Encode?
Md. Amirul Islam
Sen Jia
Neil D. B. Bruce
SSL
244
348
0
22 Jan 2020
Deep High-Resolution Representation Learning for Visual Recognition
Jingdong Wang
Ke Sun
Tianheng Cheng
Borui Jiang
Chaorui Deng
...
Yadong Mu
Mingkui Tan
Xinggang Wang
Wenyu Liu
Bin Xiao
390
3,611
0
20 Aug 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
131
18,106
0
28 May 2019
CutMix: Regularization Strategy to Train Strong Classifiers with Localizable Features
Sangdoo Yun
Dongyoon Han
Seong Joon Oh
Sanghyuk Chun
Junsuk Choe
Y. Yoo
OOD
604
4,777
0
13 May 2019
Searching for MobileNetV3
Andrew G. Howard
Mark Sandler
Grace Chu
Liang-Chieh Chen
Bo Chen
...
Yukun Zhu
Ruoming Pang
Vijay Vasudevan
Quoc V. Le
Hartwig Adam
339
6,772
0
06 May 2019
An Energy and GPU-Computation Efficient Backbone Network for Real-Time Object Detection
Youngwan Lee
Joong-won Hwang
Sangrok Lee
Yuseok Bae
Jongyoul Park
PINN
ObjD
60
369
0
22 Apr 2019
Res2Net: A New Multi-scale Backbone Architecture
Shanghua Gao
Ming-Ming Cheng
Kai Zhao
Xinyu Zhang
Ming-Hsuan Yang
Philip Torr
100
2,389
0
02 Apr 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,729
0
11 Oct 2018
ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design
Ningning Ma
Xiangyu Zhang
Haitao Zheng
Jian Sun
168
4,983
0
30 Jul 2018
Unified Perceptual Parsing for Scene Understanding
Tete Xiao
Yingcheng Liu
Bolei Zhou
Yuning Jiang
Jian Sun
OCL
VOS
181
1,883
0
26 Jul 2018
Data Augmentation by Pairing Samples for Images Classification
H. Inoue
160
422
0
09 Jan 2018
1
2
Next