ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.08415
  4. Cited By
Gaussian Error Linear Units (GELUs)

Gaussian Error Linear Units (GELUs)

27 June 2016
Dan Hendrycks
Kevin Gimpel
ArXivPDFHTML

Papers citing "Gaussian Error Linear Units (GELUs)"

46 / 946 papers shown
Title
Beyond Point Estimate: Inferring Ensemble Prediction Variation from
  Neuron Activation Strength in Recommender Systems
Beyond Point Estimate: Inferring Ensemble Prediction Variation from Neuron Activation Strength in Recommender Systems
Zhe Chen
Yuyan Wang
Dong Lin
D. Cheng
Lichan Hong
Ed H. Chi
Claire Cui
31
16
0
17 Aug 2020
Can weight sharing outperform random architecture search? An
  investigation with TuNAS
Can weight sharing outperform random architecture search? An investigation with TuNAS
Gabriel Bender
Hanxiao Liu
Bo Chen
Grace Chu
Shuyang Cheng
Pieter-Jan Kindermans
Quoc V. Le
OOD
18
121
0
13 Aug 2020
Aligning AI With Shared Human Values
Aligning AI With Shared Human Values
Dan Hendrycks
Collin Burns
Steven Basart
Andrew Critch
Jingkai Li
D. Song
Jacob Steinhardt
63
522
0
05 Aug 2020
Combining Deep Reinforcement Learning and Search for
  Imperfect-Information Games
Combining Deep Reinforcement Learning and Search for Imperfect-Information Games
Noam Brown
A. Bakhtin
Adam Lerer
Qucheng Gong
25
133
0
27 Jul 2020
Counterfactual Data Augmentation using Locally Factored Dynamics
Counterfactual Data Augmentation using Locally Factored Dynamics
Silviu Pitis
Elliot Creager
Animesh Garg
BDL
OffRL
26
85
0
06 Jul 2020
MIRA: Leveraging Multi-Intention Co-click Information in Web-scale
  Document Retrieval using Deep Neural Networks
MIRA: Leveraging Multi-Intention Co-click Information in Web-scale Document Retrieval using Deep Neural Networks
Yusi Zhang
Chuanjie Liu
Angen Luo
Hui Xue
Xuan Shan
Y. Luo
Yiqian Xia
Yuanchi Yan
Haidong Wang
13
6
0
03 Jul 2020
Unsupervised Cross-lingual Representation Learning for Speech
  Recognition
Unsupervised Cross-lingual Representation Learning for Speech Recognition
Alexis Conneau
Alexei Baevski
R. Collobert
Abdel-rahman Mohamed
Michael Auli
SSL
70
755
0
24 Jun 2020
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of
  Gradients
MaxVA: Fast Adaptation of Step Sizes by Maximizing Observed Variance of Gradients
Chenfei Zhu
Yu Cheng
Zhe Gan
Furong Huang
Jingjing Liu
Tom Goldstein
ODL
35
2
0
21 Jun 2020
Categorical Normalizing Flows via Continuous Transformations
Categorical Normalizing Flows via Continuous Transformations
Phillip Lippe
E. Gavves
BDL
21
43
0
17 Jun 2020
VirTex: Learning Visual Representations from Textual Annotations
VirTex: Learning Visual Representations from Textual Annotations
Karan Desai
Justin Johnson
SSL
VLM
30
433
0
11 Jun 2020
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge
  Injection into Pretrained Transformers
Common Sense or World Knowledge? Investigating Adapter-Based Knowledge Injection into Pretrained Transformers
Anne Lauscher
Olga Majewska
Leonardo F. R. Ribeiro
Iryna Gurevych
Nikolai Rozanov
Goran Glavaš
KELM
39
79
0
24 May 2020
Normalized Attention Without Probability Cage
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
14
21
0
19 May 2020
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained
  Conversational Representations
Span-ConveRT: Few-shot Span Extraction for Dialog with Pretrained Conversational Representations
Sam Coope
Tyler Farghly
D. Gerz
Ivan Vulić
Matthew Henderson
27
62
0
18 May 2020
Syntax-guided Controlled Generation of Paraphrases
Syntax-guided Controlled Generation of Paraphrases
Ashutosh Kumar
Kabir Ahuja
Raghuram Vadapalli
Partha P. Talukdar
29
93
0
18 May 2020
Information Seeking in the Spirit of Learning: a Dataset for
  Conversational Curiosity
Information Seeking in the Spirit of Learning: a Dataset for Conversational Curiosity
Pedro Rodriguez
Paul A. Crook
Seungwhan Moon
Zhiguang Wang
RALM
30
12
0
01 May 2020
UDapter: Language Adaptation for Truly Universal Dependency Parsing
UDapter: Language Adaptation for Truly Universal Dependency Parsing
Ahmet Üstün
Arianna Bisazza
G. Bouma
Gertjan van Noord
27
113
0
29 Apr 2020
Random Features for Kernel Approximation: A Survey on Algorithms,
  Theory, and Beyond
Random Features for Kernel Approximation: A Survey on Algorithms, Theory, and Beyond
Fanghui Liu
Xiaolin Huang
Yudong Chen
Johan A. K. Suykens
BDL
44
172
0
23 Apr 2020
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question
  Answering
Knowledge Fusion and Semantic Knowledge Ranking for Open Domain Question Answering
Pratyay Banerjee
Chitta Baral
RALM
22
24
0
07 Apr 2020
Evolving Normalization-Activation Layers
Evolving Normalization-Activation Layers
Hanxiao Liu
Andrew Brock
Karen Simonyan
Quoc V. Le
19
79
0
06 Apr 2020
Transfer Learning for Context-Aware Spoken Language Understanding
Transfer Learning for Context-Aware Spoken Language Understanding
Qian Chen
Zhu Zhuo
Wen Wang
Qiuyun Xu
16
5
0
03 Mar 2020
Keyphrase Extraction with Span-based Feature Representations
Keyphrase Extraction with Span-based Feature Representations
Funan Mu
Zhenting Yu
Lifeng Wang
Yequan Wang
Qingyu Yin
Yibo Sun
Liqun Liu
Teng Ma
Jing Tang
Xing Zhou
32
17
0
13 Feb 2020
Towards Crowdsourced Training of Large Neural Networks using
  Decentralized Mixture-of-Experts
Towards Crowdsourced Training of Large Neural Networks using Decentralized Mixture-of-Experts
Max Ryabinin
Anton I. Gusev
FedML
27
48
0
10 Feb 2020
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for
  Addressing Value Estimation Errors
Distributional Soft Actor-Critic: Off-Policy Reinforcement Learning for Addressing Value Estimation Errors
Jingliang Duan
Yang Guan
Shengbo Eben Li
Yangang Ren
B. Cheng
OffRL
25
174
0
09 Jan 2020
Machine Learning from a Continuous Viewpoint
Machine Learning from a Continuous Viewpoint
E. Weinan
Chao Ma
Lei Wu
33
102
0
30 Dec 2019
TreeGen: A Tree-Based Transformer Architecture for Code Generation
TreeGen: A Tree-Based Transformer Architecture for Code Generation
Zeyu Sun
Qihao Zhu
Yingfei Xiong
Yican Sun
Lili Mou
Lu Zhang
25
173
0
22 Nov 2019
Symmetrical Gaussian Error Linear Units (SGELUs)
Symmetrical Gaussian Error Linear Units (SGELUs)
Chao Yu
Zhiguo Su
14
11
0
10 Nov 2019
ConveRT: Efficient and Accurate Conversational Representations from
  Transformers
ConveRT: Efficient and Accurate Conversational Representations from Transformers
Matthew Henderson
I. Casanueva
Nikola Mrkvsić
Pei-hao Su
Tsung-Hsien
Ivan Vulić
21
196
0
09 Nov 2019
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language
  Generation, Translation, and Comprehension
BART: Denoising Sequence-to-Sequence Pre-training for Natural Language Generation, Translation, and Comprehension
M. Lewis
Yinhan Liu
Naman Goyal
Marjan Ghazvininejad
Abdel-rahman Mohamed
Omer Levy
Veselin Stoyanov
Luke Zettlemoyer
AIMat
VLM
41
10,620
0
29 Oct 2019
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Generative Pre-Training for Speech with Autoregressive Predictive Coding
Yu-An Chung
James R. Glass
SSL
29
173
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
16
248
0
22 Oct 2019
Multi-Task Learning for Conversational Question Answering over a
  Large-Scale Knowledge Base
Multi-Task Learning for Conversational Question Answering over a Large-Scale Knowledge Base
Tao Shen
Xiubo Geng
Tao Qin
Daya Guo
Duyu Tang
Nan Duan
Guodong Long
Daxin Jiang
33
81
0
11 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
112
6,380
0
26 Sep 2019
Mixout: Effective Regularization to Finetune Large-scale Pretrained
  Language Models
Mixout: Effective Regularization to Finetune Large-scale Pretrained Language Models
Cheolhyoung Lee
Kyunghyun Cho
Wanmo Kang
MoE
249
208
0
25 Sep 2019
Cross-Lingual Natural Language Generation via Pre-Training
Cross-Lingual Natural Language Generation via Pre-Training
Zewen Chi
Li Dong
Furu Wei
Wenhui Wang
Xian-Ling Mao
Heyan Huang
27
136
0
23 Sep 2019
Multi-Task Self-Supervised Learning for Disfluency Detection
Multi-Task Self-Supervised Learning for Disfluency Detection
Shaolei Wang
Wanxiang Che
Qi Liu
Pengda Qin
Ting Liu
William Yang Wang
SSL
22
56
0
15 Aug 2019
Adversarial Generation and Encoding of Nested Texts
Adversarial Generation and Encoding of Nested Texts
A. Rozental
GAN
19
0
0
01 Jun 2019
A Generalized Framework of Sequence Generation with Application to
  Undirected Sequence Models
A Generalized Framework of Sequence Generation with Application to Undirected Sequence Models
Elman Mansimov
Alex Jinpeng Wang
Sean Welleck
Kyunghyun Cho
AIMat
28
46
0
29 May 2019
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks
Mingxing Tan
Quoc V. Le
3DV
MedIm
11
17,783
0
28 May 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
43
171
0
10 May 2019
Unified Language Model Pre-training for Natural Language Understanding
  and Generation
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu-Chiang Frank Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
80
1,551
0
08 May 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
31
1,854
0
23 Apr 2019
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Beto, Bentz, Becas: The Surprising Cross-Lingual Effectiveness of BERT
Shijie Wu
Mark Dredze
VLM
SSeg
27
670
0
19 Apr 2019
Neural Empirical Bayes
Neural Empirical Bayes
Saeed Saremi
Aapo Hyvarinen
12
65
0
06 Mar 2019
Multi-style Generative Reading Comprehension
Multi-style Generative Reading Comprehension
Kyosuke Nishida
Itsumi Saito
Kosuke Nishida
Kazutoshi Shinoda
Atsushi Otsuka
Hisako Asano
J. Tomita
22
70
0
08 Jan 2019
NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation
NEU: A Meta-Algorithm for Universal UAP-Invariant Feature Representation
Anastasis Kratsios
Cody B. Hyndman
OOD
30
17
0
31 Aug 2018
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe
  Noise
Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise
Dan Hendrycks
Mantas Mazeika
Duncan Wilson
Kevin Gimpel
NoLa
68
547
0
14 Feb 2018
Previous
123...171819