ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.08415
  4. Cited By
Gaussian Error Linear Units (GELUs)

Gaussian Error Linear Units (GELUs)

27 June 2016
Dan Hendrycks
Kevin Gimpel
ArXivPDFHTML

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 892 papers shown
Title
Sequencer: Deep LSTM for Image Classification
Sequencer: Deep LSTM for Image Classification
Yuki Tatsunami
Masato Taki
VLM
ViT
31
78
0
04 May 2022
Flamingo: a Visual Language Model for Few-Shot Learning
Flamingo: a Visual Language Model for Few-Shot Learning
Jean-Baptiste Alayrac
Jeff Donahue
Pauline Luc
Antoine Miech
Iain Barr
...
Mikolaj Binkowski
Ricardo Barreira
Oriol Vinyals
Andrew Zisserman
Karen Simonyan
MLLM
VLM
46
3,349
0
29 Apr 2022
Vision-Language Pre-Training for Boosting Scene Text Detectors
Vision-Language Pre-Training for Boosting Scene Text Detectors
Sibo Song
Jianqiang Wan
Zhibo Yang
Jun Tang
Wenqing Cheng
Xiang Bai
Cong Yao
VLM
44
24
0
29 Apr 2022
CTCNet: A CNN-Transformer Cooperation Network for Face Image
  Super-Resolution
CTCNet: A CNN-Transformer Cooperation Network for Face Image Super-Resolution
Guangwei Gao
Zixiang Xu
Juncheng Li
Jian Yang
T. Zeng
Guo-Jun Qi
CVBM
ViT
SupR
42
81
0
19 Apr 2022
MiniViT: Compressing Vision Transformers with Weight Multiplexing
MiniViT: Compressing Vision Transformers with Weight Multiplexing
Jinnian Zhang
Houwen Peng
Kan Wu
Mengchen Liu
Bin Xiao
Jianlong Fu
Lu Yuan
ViT
28
124
0
14 Apr 2022
ViTOL: Vision Transformer for Weakly Supervised Object Localization
ViTOL: Vision Transformer for Weakly Supervised Object Localization
Saurav Gupta
Sourav Lakhotia
Abhay Rawat
Rahul Tallamraju
WSOL
34
21
0
14 Apr 2022
HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with
  Data Augmentation for Multilingual News Similarity
HFL at SemEval-2022 Task 8: A Linguistics-inspired Regression Model with Data Augmentation for Multilingual News Similarity
Zihang Xu
Ziqing Yang
Yiming Cui
Zhigang Chen
24
6
0
11 Apr 2022
Simple Baselines for Image Restoration
Simple Baselines for Image Restoration
Liangyu Chen
Xiaojie Chu
Xinming Zhang
Jian Sun
53
835
0
10 Apr 2022
Gradient-Based Trajectory Optimization With Learned Dynamics
Gradient-Based Trajectory Optimization With Learned Dynamics
Bhavya Sukhija
Nathanael Kohler
Miguel Zamora
Simon Zimmermann
Sebastian Curi
Andreas Krause
Stelian Coros
30
9
0
09 Apr 2022
Multichannel Speech Separation with Narrow-band Conformer
Multichannel Speech Separation with Narrow-band Conformer
Changsheng Quan
Xiaofei Li
31
12
0
09 Apr 2022
Points to Patches: Enabling the Use of Self-Attention for 3D Shape
  Recognition
Points to Patches: Enabling the Use of Self-Attention for 3D Shape Recognition
Axel Berg
Magnus Oskarsson
Mark O'Connor
3DPC
ViT
29
26
0
08 Apr 2022
BioBART: Pretraining and Evaluation of A Biomedical Generative Language
  Model
BioBART: Pretraining and Evaluation of A Biomedical Generative Language Model
Hongyi Yuan
Zheng Yuan
Ruyi Gan
Jiaxing Zhang
Yutao Xie
Sheng Yu
LM&MA
33
123
0
08 Apr 2022
Unleashing Vanilla Vision Transformer with Masked Image Modeling for
  Object Detection
Unleashing Vanilla Vision Transformer with Masked Image Modeling for Object Detection
Yuxin Fang
Shusheng Yang
Shijie Wang
Yixiao Ge
Ying Shan
Xinggang Wang
31
55
0
06 Apr 2022
MixFormer: Mixing Features across Windows and Dimensions
MixFormer: Mixing Features across Windows and Dimensions
Qiang Chen
Qiman Wu
Jian Wang
Qinghao Hu
T. Hu
Errui Ding
Jian Cheng
Jingdong Wang
MDE
ViT
31
103
0
06 Apr 2022
TALLFormer: Temporal Action Localization with a Long-memory Transformer
TALLFormer: Temporal Action Localization with a Long-memory Transformer
Feng Cheng
Gedas Bertasius
ViT
35
91
0
04 Apr 2022
SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits
  of One-shot Graph Generators
SPECTRE: Spectral Conditioning Helps to Overcome the Expressivity Limits of One-shot Graph Generators
Karolis Martinkus
Andreas Loukas
Nathanael Perraudin
Roger Wattenhofer
42
67
0
04 Apr 2022
PERFECT: Prompt-free and Efficient Few-shot Learning with Language
  Models
PERFECT: Prompt-free and Efficient Few-shot Learning with Language Models
Rabeeh Karimi Mahabadi
Luke Zettlemoyer
James Henderson
Marzieh Saeidi
Lambert Mathias
Ves Stoyanov
Majid Yazdani
VLM
34
69
0
03 Apr 2022
Introduction to the Artificial Intelligence that can be applied to the
  Network Automation Journey
Introduction to the Artificial Intelligence that can be applied to the Network Automation Journey
Gilbert Moisio
Alexandre Gonzalvez
Noam Zeitoun
11
2
0
02 Apr 2022
Learnable latent embeddings for joint behavioral and neural analysis
Learnable latent embeddings for joint behavioral and neural analysis
Steffen Schneider
Jin Hwa Lee
Mackenzie W. Mathis
19
209
0
01 Apr 2022
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An
  Extensive Benchmark on Air Traffic Control Communications
How Does Pre-trained Wav2Vec 2.0 Perform on Domain Shifted ASR? An Extensive Benchmark on Air Traffic Control Communications
Juan Pablo Zuluaga
Amrutha Prasad
Iuliia Nigmatulina
Seyyed Saeed Sarfjoo
P. Motlícek
Matthias Kleinert
H. Helmke
Oliver Ohneiser
Qingran Zhan
23
43
0
31 Mar 2022
Forensic Analysis and Localization of Multiply Compressed MP3 Audio
  Using Transformers
Forensic Analysis and Localization of Multiply Compressed MP3 Audio Using Transformers
Ziyue Xiang
Paolo Bestagini
Stefano Tubaro
Edward J. Delp
28
10
0
30 Mar 2022
Automatic Identification of Chemical Moieties
Automatic Identification of Chemical Moieties
Jonas Lederer
M. Gastegger
Kristof T. Schütt
Michael C. Kampffmeyer
Klaus-Robert Muller
Oliver T. Unke
21
5
0
30 Mar 2022
CAT-Net: A Cross-Slice Attention Transformer Model for Prostate Zonal
  Segmentation in MRI
CAT-Net: A Cross-Slice Attention Transformer Model for Prostate Zonal Segmentation in MRI
A. Hung
Haoxin Zheng
Qi Miao
S. Raman
D. Terzopoulos
Kyunghyun Sung
ViT
MedIm
35
44
0
29 Mar 2022
ObjectFormer for Image Manipulation Detection and Localization
ObjectFormer for Image Manipulation Detection and Localization
Junke Wang
Zuxuan Wu
Jingjing Chen
Xintong Han
Abhinav Shrivastava
Ser-Nam Lim
Yu-Gang Jiang
39
108
0
28 Mar 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts
  in the Vocabulary Space
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
69
336
0
28 Mar 2022
Diagonal State Spaces are as Effective as Structured State Spaces
Diagonal State Spaces are as Effective as Structured State Spaces
Ankit Gupta
Albert Gu
Jonathan Berant
59
292
0
27 Mar 2022
Spatially Multi-conditional Image Generation
Spatially Multi-conditional Image Generation
Ritika Chakraborty
Nikola Popovic
D. Paudel
Thomas Probst
Luc Van Gool
24
1
0
25 Mar 2022
Disentangleing Content and Fine-grained Prosody Information via Hybrid
  ASR Bottleneck Features for Voice Conversion
Disentangleing Content and Fine-grained Prosody Information via Hybrid ASR Bottleneck Features for Voice Conversion
Xintao Zhao
Feng Liu
Changhe Song
Zhiyong Wu
Shiyin Kang
Deyi Tuo
Helen Meng
16
20
0
24 Mar 2022
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech
  Translation
Exploring Continuous Integrate-and-Fire for Adaptive Simultaneous Speech Translation
Chih-Chiang Chang
Hung-yi Lee
27
13
0
22 Mar 2022
Focal Modulation Networks
Focal Modulation Networks
Jianwei Yang
Chunyuan Li
Xiyang Dai
Lu Yuan
Jianfeng Gao
3DPC
33
263
0
22 Mar 2022
Meta-attention for ViT-backed Continual Learning
Meta-attention for ViT-backed Continual Learning
Mengqi Xue
Haofei Zhang
Mingli Song
Mingli Song
CLL
32
42
0
22 Mar 2022
Compression of Generative Pre-trained Language Models via Quantization
Compression of Generative Pre-trained Language Models via Quantization
Chaofan Tao
Lu Hou
Wei Zhang
Lifeng Shang
Xin Jiang
Qun Liu
Ping Luo
Ngai Wong
MQ
38
103
0
21 Mar 2022
Transforming Gait: Video-Based Spatiotemporal Gait Analysis
Transforming Gait: Video-Based Spatiotemporal Gait Analysis
R. J. Cotton
Emoonah McClerklin
A. Cimorelli
Ankit Patel
T. Karakostas
32
10
0
17 Mar 2022
RoMe: A Robust Metric for Evaluating Natural Language Generation
RoMe: A Robust Metric for Evaluating Natural Language Generation
Md. Rony
Liubov Kovriguina
Debanjan Chaudhuri
Ricardo Usbeck
Jens Lehmann
22
12
0
17 Mar 2022
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Semi-Discrete Normalizing Flows through Differentiable Tessellation
Ricky T. Q. Chen
Brandon Amos
Maximilian Nickel
32
10
0
14 Mar 2022
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Scaling Up Your Kernels to 31x31: Revisiting Large Kernel Design in CNNs
Xiaohan Ding
Xinming Zhang
Yi Zhou
Jungong Han
Guiguang Ding
Jian Sun
VLM
49
528
0
13 Mar 2022
UNeXt: MLP-based Rapid Medical Image Segmentation Network
UNeXt: MLP-based Rapid Medical Image Segmentation Network
Jeya Maria Jose Valanarasu
Vishal M. Patel
SSeg
38
483
0
09 Mar 2022
ChiTransformer:Towards Reliable Stereo from Cues
ChiTransformer:Towards Reliable Stereo from Cues
Qing Su
Shihao Ji
MDE
ViT
18
12
0
09 Mar 2022
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
VoViT: Low Latency Graph-based Audio-Visual Voice Separation Transformer
Juan F. Montesinos
V. S. Kadandale
G. Haro
ViT
23
19
0
08 Mar 2022
Interpretable part-whole hierarchies and conceptual-semantic
  relationships in neural networks
Interpretable part-whole hierarchies and conceptual-semantic relationships in neural networks
Nicola Garau
N. Bisagno
Zeno Sambugaro
Nicola Conci
32
21
0
07 Mar 2022
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy
  for Image Recognition without Convolutions
Aggregated Pyramid Vision Transformer: Split-transform-merge Strategy for Image Recognition without Convolutions
Ruikang Ju
Ting-Yu Lin
Jen-Shiun Chiang
Jia-Hao Jian
Yu-Shian Lin
Liu-Rui-Yi Huang
ViT
16
1
0
02 Mar 2022
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training
Joya Chen
Kai Xu
Yuhui Wang
Yifei Cheng
Angela Yao
19
7
0
28 Feb 2022
TrimBERT: Tailoring BERT for Trade-offs
TrimBERT: Tailoring BERT for Trade-offs
S. N. Sridhar
Anthony Sarah
Sairam Sundaresan
MQ
21
4
0
24 Feb 2022
Activation Functions: Dive into an optimal activation function
Activation Functions: Dive into an optimal activation function
V. Bansal
FAtt
23
2
0
24 Feb 2022
Transformer Quality in Linear Time
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
81
222
0
21 Feb 2022
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
Robustness and Accuracy Could Be Reconcilable by (Proper) Definition
Tianyu Pang
Min-Bin Lin
Xiao Yang
Junyi Zhu
Shuicheng Yan
30
119
0
21 Feb 2022
Visual Attention Network
Visual Attention Network
Meng-Hao Guo
Chengrou Lu
Zheng-Ning Liu
Ming-Ming Cheng
Shiyong Hu
ViT
VLM
24
637
0
20 Feb 2022
Mixture-of-Experts with Expert Choice Routing
Mixture-of-Experts with Expert Choice Routing
Yan-Quan Zhou
Tao Lei
Han-Chu Liu
Nan Du
Yanping Huang
Vincent Zhao
Andrew M. Dai
Zhifeng Chen
Quoc V. Le
James Laudon
MoE
160
329
0
18 Feb 2022
ST-MoE: Designing Stable and Transferable Sparse Expert Models
ST-MoE: Designing Stable and Transferable Sparse Expert Models
Barret Zoph
Irwan Bello
Sameer Kumar
Nan Du
Yanping Huang
J. Dean
Noam M. Shazeer
W. Fedus
MoE
24
182
0
17 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
43
65
0
15 Feb 2022
Previous
123...121314...161718
Next