ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1606.08415
  4. Cited By
Gaussian Error Linear Units (GELUs)

Gaussian Error Linear Units (GELUs)

27 June 2016
Dan Hendrycks
Kevin Gimpel
ArXivPDFHTML

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 843 papers shown
Title
Generative Models as a Complex Systems Science: How can we make sense of
  large language model behavior?
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
14
0
31 Jul 2023
Efficient Federated Learning via Local Adaptive Amended Optimizer with
  Linear Speedup
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Yan Sun
Li Shen
Hao Sun
Liang Ding
Dacheng Tao
FedML
24
17
0
30 Jul 2023
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers
  Models for Vietnamese Visual Question Answering
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering
Khiem Vinh Tran
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
ViT
31
2
0
28 Jul 2023
Incrementally-Computable Neural Networks: Efficient Inference for
  Dynamic Inputs
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs
Or Sharir
Anima Anandkumar
32
0
0
27 Jul 2023
Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced
  Spectral and Spatial Fidelity
Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity
Matteo Ciotola
Giovanni Poggi
G. Scarpa
23
22
0
26 Jul 2023
On the unreasonable vulnerability of transformers for image restoration
  -- and an easy fix
On the unreasonable vulnerability of transformers for image restoration -- and an easy fix
Shashank Agnihotri
Kanchana Vaishnavi Gandikota
Julia Grabinski
Paramanand Chandramouli
M. Keuper
32
9
0
25 Jul 2023
Simultaneous temperature estimation and nonuniformity correction from
  multiple frames
Simultaneous temperature estimation and nonuniformity correction from multiple frames
N. Oz
O. Berman
N. Sochen
David Mendelovich
I. Klapp
22
1
0
23 Jul 2023
A Stronger Stitching Algorithm for Fisheye Images based on Deblurring
  and Registration
A Stronger Stitching Algorithm for Fisheye Images based on Deblurring and Registration
Jing Hao
Jingming Xie
Jinyuan Zhang
Moyun Liu
28
7
0
22 Jul 2023
PASTA: Pretrained Action-State Transformer Agents
PASTA: Pretrained Action-State Transformer Agents
Raphael Boige
Yannis Flet-Berliac
Arthur Flajolet
Guillaume Richard
Thomas Pierrot
LM&Ro
OffRL
37
5
0
20 Jul 2023
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
Zhihan Gao
Xingjian Shi
Boran Han
Hongya Wang
Xiaoyong Jin
Danielle C. Maddix
Yi Zhu
Mu Li
Bernie Wang
BDL
DiffM
40
56
0
19 Jul 2023
Meta-Value Learning: a General Framework for Learning with Learning
  Awareness
Meta-Value Learning: a General Framework for Learning with Learning Awareness
Tim Cooijmans
Milad Aghajohari
Aaron C. Courville
19
6
0
17 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
78
301
0
17 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action
  Recognition
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
F. Khan
ViT
54
19
0
13 Jul 2023
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient
  Descent
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent
Ruichong Zhang
28
0
0
13 Jul 2023
Quantitative CLTs in Deep Neural Networks
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
31
11
0
12 Jul 2023
Self-supervised adversarial masking for 3D point cloud representation
  learning
Self-supervised adversarial masking for 3D point cloud representation learning
Michal Szachniewicz
Wojciech Kozlowski
Michal Stypulkowski
Maciej Ziȩba
3DPC
16
2
0
11 Jul 2023
Hierarchical Autoencoder-based Lossy Compression for Large-scale
  High-resolution Scientific Data
Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data
Hieu Le
Jián Tao
AI4CE
29
2
0
09 Jul 2023
Multi-Scale Prototypical Transformer for Whole Slide Image
  Classification
Multi-Scale Prototypical Transformer for Whole Slide Image Classification
Saisai Ding
Jun Wang
Juncheng Li
Jun Shi
MedIm
29
17
0
05 Jul 2023
Relation-aware graph structure embedding with co-contrastive learning
  for drug-drug interaction prediction
Relation-aware graph structure embedding with co-contrastive learning for drug-drug interaction prediction
Mengying Jiang
Guizhong Liu
Biao Zhao
Yuanchao Su
Weiqiang Jin
CML
33
7
0
04 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
30
8
0
26 Jun 2023
Evolving Computation Graphs
Evolving Computation Graphs
Andreea Deac
Jian Tang
22
1
0
22 Jun 2023
Concurrent ischemic lesion age estimation and segmentation of CT brain
  using a Transformer-based network
Concurrent ischemic lesion age estimation and segmentation of CT brain using a Transformer-based network
A. Marcus
P. Bentley
Daniel Rueckert
MedIm
21
9
0
21 Jun 2023
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu
Liang Liao
Delin Chen
Jing Xiao
Zheng Wang
Chia-Wen Lin
Shiníchi Satoh
ViT
DiffM
36
6
0
20 Jun 2023
Learn to Enhance the Negative Information in Convolutional Neural
  Network
Learn to Enhance the Negative Information in Convolutional Neural Network
Zhicheng Cai
Chenglei Peng
Qiu Shen
16
0
0
18 Jun 2023
Point-Cloud Completion with Pretrained Text-to-image Diffusion Models
Point-Cloud Completion with Pretrained Text-to-image Diffusion Models
Yoni Kasten
Ohad Rahamim
Gal Chechik
30
24
0
18 Jun 2023
A semantically enhanced dual encoder for aspect sentiment triplet
  extraction
A semantically enhanced dual encoder for aspect sentiment triplet extraction
Baoxing Jiang
Shehui Liang
Peiyu Liu
Kaifang Dong
Hongye Li
23
15
0
14 Jun 2023
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling
  with Backtracking
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Chris Cundy
Stefano Ermon
16
10
0
08 Jun 2023
Policy-Based Self-Competition for Planning Problems
Policy-Based Self-Competition for Planning Problems
Jonathan Pirnay
Q. Göttl
Jakob Burger
D. G. Grimm
34
3
0
07 Jun 2023
Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for
  Multivariate Time Series Forecasting Dependency for Multivariate Time Series
  Forecasting
Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for Multivariate Time Series Forecasting Dependency for Multivariate Time Series Forecasting
Donghao Luo
Xue Wang
BDL
AI4TS
19
2
0
04 Jun 2023
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Xiuye Gu
Huayu Chen
Jonathan Huang
Abdullah M. Rashwan
Boxin Wang
...
Golnaz Ghiasi
Weicheng Kuo
Huizhong Chen
Liang-Chieh Chen
David A. Ross
ISeg
28
26
0
02 Jun 2023
Generalist Equivariant Transformer Towards 3D Molecular Interaction
  Learning
Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning
Xiangzhe Kong
Wen-bing Huang
Yang Liu
22
13
0
02 Jun 2023
A Transformer-based representation-learning model with unified
  processing of multimodal input for clinical diagnostics
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics
Hong-Yu Zhou
Yizhou Yu
Chengdi Wang
Shu Zhen Zhang
Yuanxu Gao
Jia-Yu Pan
Jun Shao
Guangming Lu
Kang Zhang
Weimin Li
MedIm
19
150
0
01 Jun 2023
Fast Dynamic 1D Simulation of Divertor Plasmas with Neural PDE
  Surrogates
Fast Dynamic 1D Simulation of Divertor Plasmas with Neural PDE Surrogates
Y. Poels
G. Derks
E. Westerhof
Koen Minartz
Sven Wiesen
Vlado Menkovski
3DGS
AI4CE
19
16
0
30 May 2023
Prediction Error-based Classification for Class-Incremental Learning
Prediction Error-based Classification for Class-Incremental Learning
Michal Zajkac
Tinne Tuytelaars
Gido M. van de Ven
CLL
28
8
0
30 May 2023
Improving Generalization for Multimodal Fake News Detection
Improving Generalization for Multimodal Fake News Detection
Sahar Tahmasebi
Sherzod Hakimov
Ralph Ewerth
Eric Müller-Budack
20
5
0
29 May 2023
Explicit Visual Prompting for Universal Foreground Segmentations
Explicit Visual Prompting for Universal Foreground Segmentations
Weihuang Liu
Xi Shen
Chi-Man Pun
Xiaodong Cun
VPVLM
VLM
38
14
0
29 May 2023
A Neural State-Space Model Approach to Efficient Speech Separation
A Neural State-Space Model Approach to Efficient Speech Separation
Chen Chen
Chao-Han Huck Yang
Kai Li
Yuchen Hu
Pin-Jui Ku
Chng Eng Siong
34
11
0
26 May 2023
VIP5: Towards Multimodal Foundation Models for Recommendation
VIP5: Towards Multimodal Foundation Models for Recommendation
Shijie Geng
Juntao Tan
Shuchang Liu
Zuohui Fu
Yongfeng Zhang
29
69
0
23 May 2023
EfficientSpeech: An On-Device Text to Speech Model
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
31
4
0
23 May 2023
U-TILISE: A Sequence-to-sequence Model for Cloud Removal in Optical
  Satellite Time Series
U-TILISE: A Sequence-to-sequence Model for Cloud Removal in Optical Satellite Time Series
Corinne Stucker
Vivien Sainte Fare Garnot
Konrad Schindler
AI4TS
24
13
0
22 May 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for
  Audio-to-Image Generation
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
20
20
0
22 May 2023
Curve Your Enthusiasm: Concurvity Regularization in Differentiable
  Generalized Additive Models
Curve Your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models
Julien N. Siems
Konstantin Ditschuneit
Winfried Ripken
Alma Lindborg
Maximilian Schambach
Johannes Otterbach
Martin Genzel
19
6
0
19 May 2023
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Chong Yu
Tao Chen
Zhongxue Gan
Jiayuan Fan
MQ
ViT
30
23
0
18 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States
  for Analyzing Model Predictions
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
Byung-Doh Oh
William Schuler
29
2
0
17 May 2023
Multi-Level Global Context Cross Consistency Model for Semi-Supervised
  Ultrasound Image Segmentation with Diffusion Model
Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model
Fenghe Tang
Jianrui Ding
Lingtao Wang
Min Xian
C. Ning
DiffM
MedIm
34
12
0
16 May 2023
Evaluation of self-supervised pre-training for automatic infant movement
  classification using wearable movement sensors
Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors
Einari Vaaras
Manu Airaksinen
S. Vanhatalo
Okko Rasanen
25
4
0
16 May 2023
Toward Moiré-Free and Detail-Preserving Demosaicking
Toward Moiré-Free and Detail-Preserving Demosaicking
Xuan-Yi Li
Y. Niu
Bo-Lu Zhao
Haoyuan Shi
Zitong An
31
1
0
15 May 2023
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
Abdul Rehman Khan
Asifullah Khan
ViT
MedIm
41
14
0
15 May 2023
A Multidimensional Graph Fourier Transformation Neural Network for
  Vehicle Trajectory Prediction
A Multidimensional Graph Fourier Transformation Neural Network for Vehicle Trajectory Prediction
Marion Neumeier
Andreas Tollkühn
M. Botsch
Wolfgang Utschick
22
5
0
12 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression
  loss acts as a regularizer
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
Previous
123...567...151617
Next