ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.08415
  4. Cited By
Gaussian Error Linear Units (GELUs)

Gaussian Error Linear Units (GELUs)

27 June 2016
Dan Hendrycks
Kevin Gimpel
ArXivPDFHTML

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 886 papers shown
Title
Bolt: Bridging the Gap between Auto-tuners and Hardware-native
  Performance
Bolt: Bridging the Gap between Auto-tuners and Hardware-native Performance
Jiarong Xing
Leyuan Wang
Shang Zhang
Jack H Chen
Ang Chen
Yibo Zhu
33
43
0
25 Oct 2021
ConformalLayers: A non-linear sequential neural network with associative
  layers
ConformalLayers: A non-linear sequential neural network with associative layers
Zhen Wan
Zhuoyuan Mao
C. N. Vasconcelos
22
3
0
23 Oct 2021
Logical Activation Functions: Logit-space equivalents of Probabilistic
  Boolean Operators
Logical Activation Functions: Logit-space equivalents of Probabilistic Boolean Operators
S. Lowe
Robert C. Earle
Jason dÉon
Thomas Trappenberg
Sageev Oore
23
1
0
22 Oct 2021
Vis-TOP: Visual Transformer Overlay Processor
Vis-TOP: Visual Transformer Overlay Processor
Wei Hu
Dian Xu
Zimeng Fan
Fang Liu
Yanxiang He
BDL
ViT
25
5
0
21 Oct 2021
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
Cascaded Cross MLP-Mixer GANs for Cross-View Image Translation
Bin Ren
Hao Tang
N. Sebe
35
30
0
19 Oct 2021
Improving Robustness using Generated Data
Improving Robustness using Generated Data
Sven Gowal
Sylvestre-Alvise Rebuffi
Olivia Wiles
Florian Stimberg
D. A. Calian
Timothy A. Mann
36
293
0
18 Oct 2021
NormFormer: Improved Transformer Pretraining with Extra Normalization
NormFormer: Improved Transformer Pretraining with Extra Normalization
Sam Shleifer
Jason Weston
Myle Ott
AI4CE
33
74
0
18 Oct 2021
Relation-aware Heterogeneous Graph for User Profiling
Relation-aware Heterogeneous Graph for User Profiling
Qilong Yan
Yufeng Zhang
Qiang Liu
Shu Wu
Liang Wang
36
19
0
14 Oct 2021
bert2BERT: Towards Reusable Pretrained Language Models
bert2BERT: Towards Reusable Pretrained Language Models
Cheng Chen
Yichun Yin
Lifeng Shang
Xin Jiang
Yujia Qin
Fengyu Wang
Zhi Wang
Xiao Chen
Zhiyuan Liu
Qun Liu
VLM
24
59
0
14 Oct 2021
Differentially Private Fine-tuning of Language Models
Differentially Private Fine-tuning of Language Models
Da Yu
Saurabh Naik
A. Backurs
Sivakanth Gopi
Huseyin A. Inan
...
Y. Lee
Andre Manoel
Lukas Wutschitz
Sergey Yekhanin
Huishuai Zhang
134
347
0
13 Oct 2021
Dynamic Inference with Neural Interpreters
Dynamic Inference with Neural Interpreters
Nasim Rahaman
Muhammad Waleed Gondal
S. Joshi
Peter V. Gehler
Yoshua Bengio
Francesco Locatello
Bernhard Schölkopf
39
31
0
12 Oct 2021
6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based
  Instance Representation Learning
6D-ViT: Category-Level 6D Object Pose Estimation via Transformer-based Instance Representation Learning
Lu Zou
Zhangjin Huang
Naijie Gu
Guoping Wang
ViT
31
45
0
10 Oct 2021
UniNet: Unified Architecture Search with Convolution, Transformer, and
  MLP
UniNet: Unified Architecture Search with Convolution, Transformer, and MLP
Jihao Liu
Hongsheng Li
Guanglu Song
Xin Huang
Yu Liu
ViT
37
35
0
08 Oct 2021
Pathologies in priors and inference for Bayesian transformers
Pathologies in priors and inference for Bayesian transformers
Tristan Cinquin
Alexander Immer
Max Horn
Vincent Fortuin
UQCV
BDL
MedIm
34
9
0
08 Oct 2021
Style Equalization: Unsupervised Learning of Controllable Generative
  Sequence Models
Style Equalization: Unsupervised Learning of Controllable Generative Sequence Models
Jen-Hao Rick Chang
A. Shrivastava
H. Koppula
Xiaoshuai Zhang
Oncel Tuzel
DiffM
51
16
0
06 Oct 2021
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
MoEfication: Transformer Feed-forward Layers are Mixtures of Experts
Zhengyan Zhang
Yankai Lin
Zhiyuan Liu
Peng Li
Maosong Sun
Jie Zhou
MoE
27
117
0
05 Oct 2021
Fine-tuning wav2vec2 for speaker recognition
Fine-tuning wav2vec2 for speaker recognition
Nik Vaessen
David A. van Leeuwen
42
107
0
30 Sep 2021
Introducing the DOME Activation Functions
Introducing the DOME Activation Functions
Mohamed E. Hussein
Wael AbdAlmageed
30
1
0
30 Sep 2021
Activation Functions in Deep Learning: A Comprehensive Survey and
  Benchmark
Activation Functions in Deep Learning: A Comprehensive Survey and Benchmark
S. Dubey
S. Singh
B. B. Chaudhuri
41
643
0
29 Sep 2021
UFO-ViT: High Performance Linear Vision Transformer without Softmax
UFO-ViT: High Performance Linear Vision Transformer without Softmax
Jeonggeun Song
ViT
114
20
0
29 Sep 2021
IGLU: Efficient GCN Training via Lazy Updates
IGLU: Efficient GCN Training via Lazy Updates
S. Narayanan
Aditya Sinha
Prateek Jain
Purushottam Kar
Sundararajan Sellamanickam
BDL
52
11
0
28 Sep 2021
SAU: Smooth activation function using convolution with approximate
  identities
SAU: Smooth activation function using convolution with approximate identities
Koushik Biswas
Sandeep Kumar
Shilpak Banerjee
A. Pandey
24
6
0
27 Sep 2021
iRNN: Integer-only Recurrent Neural Network
iRNN: Integer-only Recurrent Neural Network
Eyyub Sari
Vanessa Courville
V. Nia
MQ
56
4
0
20 Sep 2021
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese
Nguyen Luong Tran
Duong Minh Le
Dat Quoc Nguyen
19
52
0
20 Sep 2021
Fast and Sample-Efficient Interatomic Neural Network Potentials for
  Molecules and Materials Based on Gaussian Moments
Fast and Sample-Efficient Interatomic Neural Network Potentials for Molecules and Materials Based on Gaussian Moments
Viktor Zaverkin
David Holzmüller
Ingo Steinwart
Johannes Kastner
29
19
0
20 Sep 2021
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic
  Interactions
Dyadformer: A Multi-modal Transformer for Long-Range Modeling of Dyadic Interactions
D. Curto
Albert Clapés
Javier Selva
Sorina Smeureanu
Julio C. S. Jacques Junior
...
G. Guilera
D. Leiva
T. Moeslund
Sergio Escalera
Cristina Palmero
46
29
0
20 Sep 2021
Commonsense Knowledge in Word Associations and ConceptNet
Commonsense Knowledge in Word Associations and ConceptNet
Chunhua Liu
Trevor Cohn
Lea Frermann
30
7
0
20 Sep 2021
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural
  Networks
AutoInit: Analytic Signal-Preserving Weight Initialization for Neural Networks
G. Bingham
Risto Miikkulainen
ODL
24
4
0
18 Sep 2021
Encoding Distributional Soft Actor-Critic for Autonomous Driving in
  Multi-lane Scenarios
Encoding Distributional Soft Actor-Critic for Autonomous Driving in Multi-lane Scenarios
Jingliang Duan
Yangang Ren
Fawang Zhang
Yang Guan
Dongjie Yu
Shengbo Eben Li
B. Cheng
Lin Zhao
21
7
0
12 Sep 2021
TEASEL: A Transformer-Based Speech-Prefixed Language Model
TEASEL: A Transformer-Based Speech-Prefixed Language Model
Mehdi Arjmand
M. Dousti
H. Moradi
33
18
0
12 Sep 2021
Multilingual Translation via Grafting Pre-trained Language Models
Multilingual Translation via Grafting Pre-trained Language Models
Zewei Sun
Mingxuan Wang
Lei Li
AI4CE
191
22
0
11 Sep 2021
ErfAct and Pserf: Non-monotonic Smooth Trainable Activation Functions
ErfAct and Pserf: Non-monotonic Smooth Trainable Activation Functions
Koushik Biswas
Sandeep Kumar
Shilpak Banerjee
A. Pandey
51
13
0
09 Sep 2021
Learning the Physics of Particle Transport via Transformers
Learning the Physics of Particle Transport via Transformers
O. Pastor-Serrano
Zoltán Perkó
MedIm
21
13
0
08 Sep 2021
nnFormer: Interleaved Transformer for Volumetric Segmentation
nnFormer: Interleaved Transformer for Volumetric Segmentation
Hong-Yu Zhou
J. Guo
Yinghao Zhang
Lequan Yu
Liansheng Wang
Yizhou Yu
ViT
MedIm
27
307
0
07 Sep 2021
Vision Guided Generative Pre-trained Language Models for Multimodal
  Abstractive Summarization
Vision Guided Generative Pre-trained Language Models for Multimodal Abstractive Summarization
Tiezheng Yu
Wenliang Dai
Zihan Liu
Pascale Fung
32
73
0
06 Sep 2021
Learning to Generate Scene Graph from Natural Language Supervision
Learning to Generate Scene Graph from Natural Language Supervision
Yiwu Zhong
Jing Shi
Jianwei Yang
Chenliang Xu
Yin Li
SSL
42
77
0
06 Sep 2021
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Hire-MLP: Vision MLP via Hierarchical Rearrangement
Jianyuan Guo
Yehui Tang
Kai Han
Xinghao Chen
Han Wu
Chao Xu
Chang Xu
Yunhe Wang
46
105
0
30 Aug 2021
AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival
  Analysis with Whole Slide Images and Gene Expression Data
AMMASurv: Asymmetrical Multi-Modal Attention for Accurate Survival Analysis with Whole Slide Images and Gene Expression Data
Ruoqi Wang
Ziwang Huang
Haitao Wang
Hejun Wu
12
6
0
28 Aug 2021
LocTex: Learning Data-Efficient Visual Representations from Localized
  Textual Supervision
LocTex: Learning Data-Efficient Visual Representations from Localized Textual Supervision
Zhijian Liu
Simon Stent
Jie Li
John Gideon
Song Han
VLM
25
10
0
26 Aug 2021
TransFER: Learning Relation-aware Facial Expression Representations with
  Transformers
TransFER: Learning Relation-aware Facial Expression Representations with Transformers
Fanglei Xue
Qiangchang Wang
G. Guo
ViT
39
183
0
25 Aug 2021
Deep neural networks approach to microbial colony detection -- a
  comparative analysis
Deep neural networks approach to microbial colony detection -- a comparative analysis
Sylwia Majchrowska
J. Pawlowski
Natalia Czerep
Aleksander Górecki
Jakub Kuciñski
Tomasz Golan
20
5
0
23 Aug 2021
SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with
  Structured Semantics for Medical Text Mining
SMedBERT: A Knowledge-Enhanced Pre-trained Language Model with Structured Semantics for Medical Text Mining
Taolin Zhang
Zerui Cai
Chengyu Wang
Minghui Qiu
Bite Yang
Xiaofeng He
AI4MH
28
52
0
20 Aug 2021
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in
  Sequential Recommendation
MOI-Mixer: Improving MLP-Mixer with Multi Order Interactions in Sequential Recommendation
Hojoon Lee
Dongyoon Hwang
Sunghwan Hong
Changyeon Kim
Seungryong Kim
Jaegul Choo
27
10
0
17 Aug 2021
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial
  Locality?
RaftMLP: How Much Can Be Done Without Attention and with Less Spatial Locality?
Yuki Tatsunami
Masato Taki
30
12
0
09 Aug 2021
Congested Crowd Instance Localization with Dilated Convolutional Swin
  Transformer
Congested Crowd Instance Localization with Dilated Convolutional Swin Transformer
Junyuan Gao
Maoguo Gong
Xuelong Li
ViT
19
46
0
02 Aug 2021
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
PPT Fusion: Pyramid Patch Transformerfor a Case Study in Image Fusion
Yu Fu
Tianyang Xu
Xiaojun Wu
J. Kittler
ViT
27
37
0
29 Jul 2021
Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign
  Language Recognition
Multi-Scale Local-Temporal Similarity Fusion for Continuous Sign Language Recognition
Pan Xie
Zhi Cui
Yao Du
Mengyi Zhao
Jianwei Cui
Bin Wang
Xiaohui Hu
SLR
23
32
0
27 Jul 2021
CycleMLP: A MLP-like Architecture for Dense Prediction
CycleMLP: A MLP-like Architecture for Dense Prediction
Shoufa Chen
Enze Xie
Chongjian Ge
Runjian Chen
Ding Liang
Ping Luo
33
231
0
21 Jul 2021
Directly Training Joint Energy-Based Models for Conditional Synthesis
  and Calibrated Prediction of Multi-Attribute Data
Directly Training Joint Energy-Based Models for Conditional Synthesis and Calibrated Prediction of Multi-Attribute Data
Jacob Kelly
R. Zemel
Will Grathwohl
41
2
0
19 Jul 2021
Simultaneous Speech Translation for Live Subtitling: from Delay to
  Display
Simultaneous Speech Translation for Live Subtitling: from Delay to Display
Alina Karakanta
Sara Papi
Matteo Negri
Marco Turchi
28
10
0
19 Jul 2021
Previous
123...1415161718
Next