Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2202.04350
Cited By
v1
v2 (latest)
pNLP-Mixer: an Efficient all-MLP Architecture for Language
9 February 2022
Francesco Fusco
Damian Pascual
Peter W. J. Staar
Diego Antognini
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"pNLP-Mixer: an Efficient all-MLP Architecture for Language"
21 / 21 papers shown
Title
Masked Mixers for Language Generation and Retrieval
Benjamin L. Badger
153
0
0
02 Sep 2024
Efficient Language Modeling with Sparse all-MLP
Ping Yu
Mikel Artetxe
Myle Ott
Sam Shleifer
Hongyu Gong
Ves Stoyanov
Xian Li
MoE
68
11
0
14 Mar 2022
Towards Efficient NLP: A Standard Evaluation and A Strong Baseline
Xiangyang Liu
Tianxiang Sun
Junliang He
Jiawen Wu
Lingling Wu
Xinyu Zhang
Hao Jiang
Bo Zhao
Xuanjing Huang
Xipeng Qiu
ELM
72
47
0
13 Oct 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
441
2,689
0
04 May 2021
Distilling Large Language Models into Tiny and Effective Students using pQRNN
P. Kaliamoorthi
Aditya Siddhant
Edward Li
Melvin Johnson
MQ
47
17
0
21 Jan 2021
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
170
1,128
0
14 Sep 2020
MTOP: A Comprehensive Multilingual Task-Oriented Semantic Parsing Benchmark
Haoran Li
Abhinav Arora
Shuohui Chen
Anchit Gupta
Sonal Gupta
Yashar Mehdad
118
180
0
21 Aug 2020
End-to-End Slot Alignment and Recognition for Cross-Lingual NLU
Weijia Xu
Batool Haider
Saab Mansour
59
155
0
29 Apr 2020
ProFormer: Towards On-Device LSH Projection Based Transformers
Chinnadhurai Sankar
Sujith Ravi
Zornitsa Kozareva
53
9
0
13 Apr 2020
MobileBERT: a Compact Task-Agnostic BERT for Resource-Limited Devices
Zhiqing Sun
Hongkun Yu
Xiaodan Song
Renjie Liu
Yiming Yang
Denny Zhou
MQ
118
817
0
06 Apr 2020
Unsupervised Cross-lingual Representation Learning at Scale
Alexis Conneau
Kartikay Khandelwal
Naman Goyal
Vishrav Chaudhary
Guillaume Wenzek
Francisco Guzmán
Edouard Grave
Myle Ott
Luke Zettlemoyer
Veselin Stoyanov
228
6,593
0
05 Nov 2019
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
264
7,554
0
02 Oct 2019
TinyBERT: Distilling BERT for Natural Language Understanding
Xiaoqi Jiao
Yichun Yin
Lifeng Shang
Xin Jiang
Xiao Chen
Linlin Li
F. Wang
Qun Liu
VLM
113
1,872
0
23 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
Mohammad Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
341
1,918
0
17 Sep 2019
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
1.1K
7,201
0
20 Apr 2018
Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference
Benoit Jacob
S. Kligys
Bo Chen
Menglong Zhu
Matthew Tang
Andrew G. Howard
Hartwig Adam
Dmitry Kalenichenko
MQ
167
3,143
0
15 Dec 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
435
10,541
0
21 Jul 2016
Gaussian Error Linear Units (GELUs)
Dan Hendrycks
Kevin Gimpel
174
5,049
0
27 Jun 2016
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
238
7,760
0
31 Aug 2015
Empirical Evaluation of Rectified Activations in Convolutional Network
Bing Xu
Naiyan Wang
Tianqi Chen
Mu Li
142
2,914
0
05 May 2015
In Defense of MinHash Over SimHash
Anshumali Shrivastava
Ping Li
76
112
0
16 Jul 2014
1