Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.08415
Cited By
Gaussian Error Linear Units (GELUs)
27 June 2016
Dan Hendrycks
Kevin Gimpel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gaussian Error Linear Units (GELUs)"
50 / 882 papers shown
Title
Simple Training Strategies and Model Scaling for Object Detection
Xianzhi Du
Barret Zoph
Wei-Chih Hung
Nayeon Lee
ObjD
33
40
0
30 Jun 2021
Rethinking Token-Mixing MLP for MLP-based Vision Backbone
Tan Yu
Xu Li
Yunfeng Cai
Mingming Sun
Ping Li
45
26
0
28 Jun 2021
PVT v2: Improved Baselines with Pyramid Vision Transformer
Wenhai Wang
Enze Xie
Xiang Li
Deng-Ping Fan
Kaitao Song
Ding Liang
Tong Lu
Ping Luo
Ling Shao
ViT
AI4TS
50
1,615
0
25 Jun 2021
IA-RED
2
^2
2
: Interpretability-Aware Redundancy Reduction for Vision Transformers
Bowen Pan
Yikang Shen
Yi Ding
Zhangyang Wang
Rogerio Feris
A. Oliva
VLM
ViT
39
153
0
23 Jun 2021
Dealing with training and test segmentation mismatch: FBK@IWSLT2021
Sara Papi
Marco Gaido
Matteo Negri
Marco Turchi
39
6
0
23 Jun 2021
P2T: Pyramid Pooling Transformer for Scene Understanding
Yu-Huan Wu
Yun-Hai Liu
Xin Zhan
Mingg-Ming Cheng
ViT
29
219
0
22 Jun 2021
OadTR: Online Action Detection with Transformers
Xiang Wang
Shiwei Zhang
Zhiwu Qing
Yuanjie Shao
Zhe Zuo
Changxin Gao
Nong Sang
OffRL
ViT
34
109
0
21 Jun 2021
Multi-mode Transformer Transducer with Stochastic Future Context
Kwangyoun Kim
Felix Wu
Prashant Sridhar
Kyu Jeong Han
Shinji Watanabe
30
9
0
17 Jun 2021
Multi-head or Single-head? An Empirical Comparison for Transformer Training
Liyuan Liu
Jialu Liu
Jiawei Han
23
32
0
17 Jun 2021
Reborn Mechanism: Rethinking the Negative Phase Information Flow in Convolutional Neural Network
Zhicheng Cai
Kaizhu Huang
Chenglei Peng
6
0
0
13 Jun 2021
Scaling Vision with Sparse Mixture of Experts
C. Riquelme
J. Puigcerver
Basil Mustafa
Maxim Neumann
Rodolphe Jenatton
André Susano Pinto
Daniel Keysers
N. Houlsby
MoE
17
575
0
10 Jun 2021
Programming Puzzles
Tal Schuster
Ashwin Kalyan
Oleksandr Polozov
Adam Tauman Kalai
ELM
17
32
0
10 Jun 2021
Supervising the Transfer of Reasoning Patterns in VQA
Corentin Kervadec
Christian Wolf
G. Antipov
M. Baccouche
Madiha Nadri Wolf
27
10
0
10 Jun 2021
How Robust are Model Rankings: A Leaderboard Customization Approach for Equitable Evaluation
Swaroop Mishra
Anjana Arunkumar
34
24
0
10 Jun 2021
CoAtNet: Marrying Convolution and Attention for All Data Sizes
Zihang Dai
Hanxiao Liu
Quoc V. Le
Mingxing Tan
ViT
49
1,167
0
09 Jun 2021
Compacter: Efficient Low-Rank Hypercomplex Adapter Layers
Rabeeh Karimi Mahabadi
James Henderson
Sebastian Ruder
MoE
67
468
0
08 Jun 2021
A Survey of Transformers
Tianyang Lin
Yuxin Wang
Xiangyang Liu
Xipeng Qiu
ViT
53
1,088
0
08 Jun 2021
Graph-MLP: Node Classification without Message Passing in Graph
Yang Hu
Haoxuan You
Zhecan Wang
Zhicheng Wang
Erjin Zhou
Yue Gao
27
108
0
08 Jun 2021
Reveal of Vision Transformers Robustness against Adversarial Attacks
Ahmed Aldahdooh
W. Hamidouche
Olivier Déforges
ViT
15
56
0
07 Jun 2021
Self-supervised Depth Estimation Leveraging Global Perception and Geometric Smoothness Using On-board Videos
Shaocheng Jia
Xin Pei
W. Yao
S. Wong
3DPC
MDE
43
19
0
07 Jun 2021
Empowering Language Understanding with Counterfactual Reasoning
Fuli Feng
Jizhi Zhang
Xiangnan He
Hanwang Zhang
Tat-Seng Chua
LRM
21
33
0
06 Jun 2021
Learning Dynamic Graph Representation of Brain Connectome with Spatio-Temporal Attention
Byung-Hoon Kim
Jong Chul Ye
Jae-Jin Kim
34
129
0
27 May 2021
Efficient and Accurate Gradients for Neural SDEs
Patrick Kidger
James Foster
Xuechen Li
Terry Lyons
DiffM
24
60
0
27 May 2021
Fixed-Dimensional and Permutation Invariant State Representation of Autonomous Driving
Jingliang Duan
Dongjie Yu
Shengbo Eben Li
Wenxuan Wang
Yangang Ren
Ziyu Lin
B. Cheng
24
10
0
24 May 2021
One4all User Representation for Recommender Systems in E-commerce
Kyuyong Shin
Hanock Kwak
KyungHyun Kim
Minkyu Kim
Young-Jin Park
Jisu Jeong
Seungjae Jung
28
27
0
24 May 2021
Vision Transformer for Fast and Efficient Scene Text Recognition
Rowel Atienza
ViT
25
144
0
18 May 2021
Link Prediction on N-ary Relational Facts: A Graph-based Approach
Quan Wang
Haifeng Wang
Yajuan Lyu
Yong Zhu
24
46
0
18 May 2021
Sparta: Spatially Attentive and Adversarially Robust Activation
Qing Guo
Felix Juefei Xu
Changqing Zhou
Wei Feng
Yang Liu
Song Wang
AAML
33
4
0
18 May 2021
Pay Attention to MLPs
Hanxiao Liu
Zihang Dai
David R. So
Quoc V. Le
AI4CE
57
651
0
17 May 2021
Vision Transformers are Robust Learners
Sayak Paul
Pin-Yu Chen
ViT
28
307
0
17 May 2021
Counterfactual Explanations for Neural Recommenders
Khanh Tran
Azin Ghazimatin
Rishiraj Saha Roy
AAML
CML
57
65
0
11 May 2021
ResMLP: Feedforward networks for image classification with data-efficient training
Hugo Touvron
Piotr Bojanowski
Mathilde Caron
Matthieu Cord
Alaaeldin El-Nouby
...
Gautier Izacard
Armand Joulin
Gabriel Synnaeve
Jakob Verbeek
Hervé Jégou
VLM
30
656
0
07 May 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
277
2,606
0
04 May 2021
SpookyNet: Learning Force Fields with Electronic Degrees of Freedom and Nonlocal Effects
Oliver T. Unke
Stefan Chmiela
M. Gastegger
Kristof T. Schütt
H. E. Sauceda
K. Müller
177
247
0
01 May 2021
Reconstructing nodal pressures in water distribution systems with graph neural networks
Gergely Hajgató
Bálint Gyires-Tóth
Gyorgy Paál
20
14
0
28 Apr 2021
Rich Semantics Improve Few-shot Learning
Mohamed Afham
Salman Khan
Muhammad Haris Khan
Muzammal Naseer
Fahad Shahbaz Khan
VLM
20
24
0
26 Apr 2021
Multiscale Vision Transformers
Haoqi Fan
Bo Xiong
K. Mangalam
Yanghao Li
Zhicheng Yan
Jitendra Malik
Christoph Feichtenhofer
ViT
63
1,224
0
22 Apr 2021
A novel time-frequency Transformer based on self-attention mechanism and its application in fault diagnosis of rolling bearings
Yifei Ding
M. Jia
Qiuhua Miao
Yudong Cao
16
268
0
19 Apr 2021
Knowledge Neurons in Pretrained Transformers
Damai Dai
Li Dong
Y. Hao
Zhifang Sui
Baobao Chang
Furu Wei
KELM
MU
16
417
0
18 Apr 2021
AMMU : A Survey of Transformer-based Biomedical Pretrained Language Models
Katikapalli Subramanyam Kalyan
A. Rajasekharan
S. Sangeetha
LM&MA
MedIm
26
164
0
16 Apr 2021
SummScreen: A Dataset for Abstractive Screenplay Summarization
Mingda Chen
Zewei Chu
Sam Wiseman
Kevin Gimpel
35
94
0
14 Apr 2021
K-PLUG: Knowledge-injected Pre-trained Language Model for Natural Language Understanding and Generation in E-Commerce
Song Xu
Haoran Li
Peng Yuan
Yujia Wang
Youzheng Wu
Xiaodong He
Ying Liu
Bowen Zhou
KELM
35
24
0
14 Apr 2021
Visual Goal-Step Inference using wikiHow
Yue Yang
Artemis Panagopoulou
Qing Lyu
Li Zhang
Mark Yatskar
Chris Callison-Burch
34
41
0
12 Apr 2021
SiT: Self-supervised vIsion Transformer
Sara Atito Ali Ahmed
Muhammad Awais
J. Kittler
ViT
39
139
0
08 Apr 2021
AAformer: Auto-Aligned Transformer for Person Re-Identification
Kuan Zhu
Haiyun Guo
Shiliang Zhang
Yaowei Wang
Jing Liu
Jinqiao Wang
Ming Tang
ViT
35
112
0
02 Apr 2021
Evaluating Neural Word Embeddings for Sanskrit
Kevin Qinghong Lin
Om Adideva
Digumarthi Komal
Laxmidhar Behera
Pawan Goyal
29
12
0
01 Apr 2021
ViViT: A Video Vision Transformer
Anurag Arnab
Mostafa Dehghani
G. Heigold
Chen Sun
Mario Lucic
Cordelia Schmid
ViT
30
2,088
0
29 Mar 2021
Vision Transformers for Dense Prediction
René Ranftl
Alexey Bochkovskiy
V. Koltun
ViT
MDE
45
1,662
0
24 Mar 2021
Scaling Local Self-Attention for Parameter Efficient Visual Backbones
Ashish Vaswani
Prajit Ramachandran
A. Srinivas
Niki Parmar
Blake A. Hechtman
Jonathon Shlens
27
395
0
23 Mar 2021
Scalable Vision Transformers with Hierarchical Pooling
Zizheng Pan
Bohan Zhuang
Jing Liu
Haoyu He
Jianfei Cai
ViT
27
126
0
19 Mar 2021
Previous
1
2
3
...
15
16
17
18
Next