Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2208.08191
Cited By
Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems
17 August 2022
D. Navon
A. Bronstein
MoE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Transformer Vs. MLP-Mixer: Exponential Expressive Gap For NLP Problems"
30 / 30 papers shown
Title
DaViT: Dual Attention Vision Transformers
Mingyu Ding
Bin Xiao
Noel Codella
Ping Luo
Jingdong Wang
Lu Yuan
ViT
90
248
0
07 Apr 2022
MaxViT: Multi-Axis Vision Transformer
Zhengzhong Tu
Hossein Talebi
Han Zhang
Feng Yang
P. Milanfar
A. Bovik
Yinxiao Li
ViT
97
649
0
04 Apr 2022
Model soups: averaging weights of multiple fine-tuned models improves accuracy without increasing inference time
Mitchell Wortsman
Gabriel Ilharco
S. Gadre
Rebecca Roelofs
Raphael Gontijo-Lopes
...
Hongseok Namkoong
Ali Farhadi
Y. Carmon
Simon Kornblith
Ludwig Schmidt
MoMe
110
953
1
10 Mar 2022
The Inductive Bias of In-Context Learning: Rethinking Pretraining Example Design
Yoav Levine
Noam Wies
Daniel Jannai
D. Navon
Yedid Hoshen
Amnon Shashua
AI4CE
55
36
0
09 Oct 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
70
105
0
30 Aug 2021
AS-MLP: An Axial Shifted MLP Architecture for Vision
Dongze Lian
Zehao Yu
Xing Sun
Shenghua Gao
104
189
0
18 Jul 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
83
1,634
0
25 Jun 2021
Vision Permutator: A Permutable MLP-Like Architecture for Visual Recognition
Qibin Hou
Zihang Jiang
Li-xin Yuan
Mingg-Ming Cheng
Shuicheng Yan
Jiashi Feng
ViT
MLLM
97
206
0
23 Jun 2021
S
2
^2
2
-MLP: Spatial-Shift MLP Architecture for Vision
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
59
187
0
14 Jun 2021
Which transformer architecture fits my data? A vocabulary bottleneck in self-attention
Noam Wies
Yoav Levine
Daniel Jannai
Amnon Shashua
65
20
0
09 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
381
2,638
0
04 May 2021
Twins: Revisiting the Design of Spatial Attention in Vision Transformers
Xiangxiang Chu
Zhi Tian
Yuqing Wang
Bo Zhang
Haibing Ren
Xiaolin K. Wei
Huaxia Xia
Chunhua Shen
ViT
48
1,006
0
28 Apr 2021
Rethinking Spatial Dimensions of Vision Transformers
Byeongho Heo
Sangdoo Yun
Dongyoon Han
Sanghyuk Chun
Junsuk Choe
Seong Joon Oh
ViT
475
573
0
30 Mar 2021
Swin Transformer: Hierarchical Vision Transformer using Shifted Windows
Ze Liu
Yutong Lin
Yue Cao
Han Hu
Yixuan Wei
Zheng Zhang
Stephen Lin
B. Guo
ViT
336
21,175
0
25 Mar 2021
Learning Transferable Visual Models From Natural Language Supervision
Alec Radford
Jong Wook Kim
Chris Hallacy
Aditya A. Ramesh
Gabriel Goh
...
Amanda Askell
Pamela Mishkin
Jack Clark
Gretchen Krueger
Ilya Sutskever
CLIP
VLM
686
28,659
0
26 Feb 2021
An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale
Alexey Dosovitskiy
Lucas Beyer
Alexander Kolesnikov
Dirk Weissenborn
Xiaohua Zhai
...
Matthias Minderer
G. Heigold
Sylvain Gelly
Jakob Uszkoreit
N. Houlsby
ViT
408
40,217
0
22 Oct 2020
The Depth-to-Width Interplay in Self-Attention
Yoav Levine
Noam Wies
Or Sharir
Hofit Bata
Amnon Shashua
49
46
0
22 Jun 2020
Train Large, Then Compress: Rethinking Model Size for Efficient Training and Inference of Transformers
Zhuohan Li
Eric Wallace
Sheng Shen
Kevin Lin
Kurt Keutzer
Dan Klein
Joseph E. Gonzalez
83
150
0
26 Feb 2020
Depth-Width Trade-offs for ReLU Networks via Sharkovsky's Theorem
Vaggos Chatziafratis
Sai Ganesh Nagarajan
Ioannis Panageas
Tianlin Li
34
21
0
09 Dec 2019
Deformable ConvNets v2: More Deformable, Better Results
Xizhou Zhu
Han Hu
Stephen Lin
Jifeng Dai
ObjD
83
1,998
0
27 Nov 2018
On the Long-Term Memory of Deep Recurrent Networks
Yoav Levine
Or Sharir
Alon Ziv
Amnon Shashua
36
24
0
25 Oct 2017
Revisiting Unreasonable Effectiveness of Data in Deep Learning Era
Chen Sun
Abhinav Shrivastava
Saurabh Singh
Abhinav Gupta
VLM
110
2,386
0
10 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
484
129,831
0
12 Jun 2017
Analysis and Design of Convolutional Networks via Hierarchical Tensor Decompositions
Nadav Cohen
Or Sharir
Yoav Levine
Ronen Tamari
David Yakira
Amnon Shashua
93
38
0
05 May 2017
Deformable Convolutional Networks
Jifeng Dai
Haozhi Qi
Yuwen Xiong
Yi Li
Guodong Zhang
Han Hu
Yichen Wei
188
5,291
0
17 Mar 2017
Aggregated Residual Transformations for Deep Neural Networks
Saining Xie
Ross B. Girshick
Piotr Dollár
Zhuowen Tu
Kaiming He
437
10,281
0
16 Nov 2016
Densely Connected Convolutional Networks
Gao Huang
Zhuang Liu
Laurens van der Maaten
Kilian Q. Weinberger
PINN
3DV
639
36,599
0
25 Aug 2016
Wide Residual Networks
Sergey Zagoruyko
N. Komodakis
268
7,951
0
23 May 2016
Inductive Bias of Deep Convolutional Networks through Pooling Geometry
Nadav Cohen
Amnon Shashua
36
132
0
22 May 2016
Deep Residual Learning for Image Recognition
Kaiming He
Xinming Zhang
Shaoqing Ren
Jian Sun
MedIm
1.4K
192,638
0
10 Dec 2015
1