Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.08415
Cited By
Gaussian Error Linear Units (GELUs)
27 June 2016
Dan Hendrycks
Kevin Gimpel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gaussian Error Linear Units (GELUs)"
46 / 946 papers shown
Title
Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems
Zhe Chen
Yuyan Wang
Dong Lin
D. Cheng
Lichan Hong
Ed H. Chi
Claire Cui
31
16
0
17 Aug 2020
Can weight sharing outperform random architecture search? An investigation with TuNAS
Gabriel Bender
Hanxiao Liu
Bo Chen
Grace Chu
Shuyang Cheng
Pieter-Jan Kindermans
Quoc V. Le
OOD
18
121
0
13 Aug 2020
Aligning AI With Shared Human Values
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Jingkai Li
D. Song
Jacob Steinhardt
63
522
0
05 Aug 2020
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
Noam Brown
A. Bakhtin
Adam Lerer
Qucheng Gong
25
133
0
27 Jul 2020
Counterfactual Data Augmentation using Locally Factored Dynamics
Silviu Pitis
Elliot Creager
Animesh Garg
BDL
OffRL
26
85
0
06 Jul 2020
MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks
Yusi Zhang
Chuanjie Liu
Angen Luo
Hui Xue
Xuan Shan
Y. Luo
Yiqian Xia
Yuanchi Yan
Haidong Wang
13
6
0
03 Jul 2020
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
70
755
0
24 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
35
2
0
21 Jun 2020
Categorical Normalizing Flows via Continuous Transformations
Phillip Lippe
E. Gavves
BDL
21
43
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
433
0
11 Jun 2020
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher
Olga Majewska
Leonardo F. R. Ribeiro
Iryna Gurevych
Nikolai Rozanov
Goran Glavaš
KELM
39
79
0
24 May 2020
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations
Sam Coope
Tyler Farghly
D. Gerz
Ivan Vulić
Matthew Henderson
27
62
0
18 May 2020
Syntax-guided Controlled Generation of Paraphrases
Ashutosh Kumar
Kabir Ahuja
Raghuram Vadapalli
Partha P. Talukdar
29
93
0
18 May 2020
Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity
Pedro Rodriguez
Paul A. Crook
Seungwhan Moon
Zhiguang Wang
RALM
30
12
0
01 May 2020
UDapter: Language Adaptation for Truly Universal Dependency Parsing
Ahmet Üstün
Arianna Bisazza
G. Bouma
Gertjan van Noord
27
113
0
29 Apr 2020
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
Fanghui Liu
Xiaolin Huang
Yudong Chen
Johan A. K. Suykens
BDL
44
172
0
23 Apr 2020
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering
Pratyay Banerjee
Chitta Baral
RALM
22
24
0
07 Apr 2020
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
19
79
0
06 Apr 2020
Transfer Learning for Context-Aware Spoken Language Understanding
Qian Chen
Zhu Zhuo
Wen Wang
Qiuyun Xu
16
5
0
03 Mar 2020
Keyphrase Extraction with Span-based Feature Representations
Funan Mu
Zhenting Yu
Lifeng Wang
Yequan Wang
Qingyu Yin
Yibo Sun
Liqun Liu
Teng Ma
Jing Tang
Xing Zhou
32
17
0
13 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Jingliang Duan
Yang Guan
Shengbo Eben Li
Yangang Ren
B. Cheng
OffRL
25
174
0
09 Jan 2020
Machine Learning from a Continuous Viewpoint
E. Weinan
Chao Ma
Lei Wu
33
102
0
30 Dec 2019
TreeGen: A Tree-Based Transformer Architecture for Code Generation
Zeyu Sun
Qihao Zhu
Yingfei Xiong
Yican Sun
Lili Mou
Lu Zhang
25
173
0
22 Nov 2019
Symmetrical Gaussian Error Linear Units (SGELUs)
Chao Yu
Zhiguo Su
14
11
0
10 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
21
196
0
09 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
41
10,620
0
29 Oct 2019
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
29
173
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
16
248
0
22 Oct 2019
Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base
Tao Shen
Xiubo Geng
Tao Qin
Daya Guo
Duyu Tang
Nan Duan
Guodong Long
Daxin Jiang
33
81
0
11 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
112
6,380
0
26 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
208
0
25 Sep 2019
Cross-Lingual Natural Language Generation via Pre-Training
Zewen Chi
Li Dong
Furu Wei
Wenhui Wang
Xian-Ling Mao
Heyan Huang
27
136
0
23 Sep 2019
Multi-Task Self-Supervised Learning for Disfluency Detection
Shaolei Wang
Wanxiang Che
Qi Liu
Pengda Qin
Ting Liu
William Yang Wang
SSL
22
56
0
15 Aug 2019
Adversarial Generation and Encoding of Nested Texts
A. Rozental
GAN
19
0
0
01 Jun 2019
A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
Elman Mansimov
Alex Jinpeng Wang
Sean Welleck
Kyunghyun Cho
AIMat
28
46
0
29 May 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
11
17,783
0
28 May 2019
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
43
171
0
10 May 2019
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu-Chiang Frank Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
80
1,551
0
08 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
31
1,854
0
23 Apr 2019
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Shijie Wu
Mark Dredze
VLM
SSeg
27
670
0
19 Apr 2019
Neural Empirical Bayes
Saeed Saremi
Aapo Hyvarinen
12
65
0
06 Mar 2019
Multi-style Generative Reading Comprehension
Kyosuke Nishida
Itsumi Saito
Kosuke Nishida
Kazutoshi Shinoda
Atsushi Otsuka
Hisako Asano
J. Tomita
22
70
0
08 Jan 2019
NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation
Anastasis Kratsios
Cody B. Hyndman
OOD
30
17
0
31 Aug 2018
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
Dan Hendrycks
Mantas Mazeika
Duncan Wilson
Kevin Gimpel
NoLa
68
547
0
14 Feb 2018
Previous
1
2
3
...
17
18
19