Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1606.08415
Cited By
Gaussian Error Linear Units (GELUs)
27 June 2016
Dan Hendrycks
Kevin Gimpel
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Gaussian Error Linear Units (GELUs)"
50 / 843 papers shown
Title
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior?
Ari Holtzman
Peter West
Luke Zettlemoyer
AI4CE
30
14
0
31 Jul 2023
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup
Yan Sun
Li Shen
Hao Sun
Liang Ding
Dacheng Tao
FedML
24
17
0
30 Jul 2023
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering
Khiem Vinh Tran
Kiet Van Nguyen
Ngan Luu-Thuy Nguyen
ViT
31
2
0
28 Jul 2023
Incrementally-Computable Neural Networks: Efficient Inference for Dynamic Inputs
Or Sharir
Anima Anandkumar
32
0
0
27 Jul 2023
Unsupervised Deep Learning-based Pansharpening with Jointly-Enhanced Spectral and Spatial Fidelity
Matteo Ciotola
Giovanni Poggi
G. Scarpa
23
22
0
26 Jul 2023
On the unreasonable vulnerability of transformers for image restoration -- and an easy fix
Shashank Agnihotri
Kanchana Vaishnavi Gandikota
Julia Grabinski
Paramanand Chandramouli
M. Keuper
32
9
0
25 Jul 2023
Simultaneous temperature estimation and nonuniformity correction from multiple frames
N. Oz
O. Berman
N. Sochen
David Mendelovich
I. Klapp
22
1
0
23 Jul 2023
A Stronger Stitching Algorithm for Fisheye Images based on Deblurring and Registration
Jing Hao
Jingming Xie
Jinyuan Zhang
Moyun Liu
28
7
0
22 Jul 2023
PASTA: Pretrained Action-State Transformer Agents
Raphael Boige
Yannis Flet-Berliac
Arthur Flajolet
Guillaume Richard
Thomas Pierrot
LM&Ro
OffRL
37
5
0
20 Jul 2023
PreDiff: Precipitation Nowcasting with Latent Diffusion Models
Zhihan Gao
Xingjian Shi
Boran Han
Hongya Wang
Xiaoyong Jin
Danielle C. Maddix
Yi Zhu
Mu Li
Bernie Wang
BDL
DiffM
40
56
0
19 Jul 2023
Meta-Value Learning: a General Framework for Learning with Learning Awareness
Tim Cooijmans
Milad Aghajohari
Aaron C. Courville
19
6
0
17 Jul 2023
Retentive Network: A Successor to Transformer for Large Language Models
Yutao Sun
Li Dong
Shaohan Huang
Shuming Ma
Yuqing Xia
Jilong Xue
Jianyong Wang
Furu Wei
LRM
78
301
0
17 Jul 2023
Video-FocalNets: Spatio-Temporal Focal Modulation for Video Action Recognition
Syed Talal Wasim
Muhammad Uzair Khattak
Muzammal Naseer
Salman Khan
M. Shah
F. Khan
ViT
54
19
0
13 Jul 2023
Cramer Type Distances for Learning Gaussian Mixture Models by Gradient Descent
Ruichong Zhang
28
0
0
13 Jul 2023
Quantitative CLTs in Deep Neural Networks
Stefano Favaro
Boris Hanin
Domenico Marinucci
I. Nourdin
G. Peccati
BDL
31
11
0
12 Jul 2023
Self-supervised adversarial masking for 3D point cloud representation learning
Michal Szachniewicz
Wojciech Kozlowski
Michal Stypulkowski
Maciej Ziȩba
3DPC
16
2
0
11 Jul 2023
Hierarchical Autoencoder-based Lossy Compression for Large-scale High-resolution Scientific Data
Hieu Le
Jián Tao
AI4CE
29
2
0
09 Jul 2023
Multi-Scale Prototypical Transformer for Whole Slide Image Classification
Saisai Ding
Jun Wang
Juncheng Li
Jun Shi
MedIm
29
17
0
05 Jul 2023
Relation-aware graph structure embedding with co-contrastive learning for drug-drug interaction prediction
Mengying Jiang
Guizhong Liu
Biao Zhao
Yuanchao Su
Weiqiang Jin
CML
33
7
0
04 Jul 2023
Nonconvex Stochastic Bregman Proximal Gradient Method with Application to Deep Learning
Kuan-Fu Ding
Jingyang Li
Kim-Chuan Toh
30
8
0
26 Jun 2023
Evolving Computation Graphs
Andreea Deac
Jian Tang
22
1
0
22 Jun 2023
Concurrent ischemic lesion age estimation and segmentation of CT brain using a Transformer-based network
A. Marcus
P. Bentley
Daniel Rueckert
MedIm
21
9
0
21 Jun 2023
TransRef: Multi-Scale Reference Embedding Transformer for Reference-Guided Image Inpainting
Taorong Liu
Liang Liao
Delin Chen
Jing Xiao
Zheng Wang
Chia-Wen Lin
Shiníchi Satoh
ViT
DiffM
36
6
0
20 Jun 2023
Learn to Enhance the Negative Information in Convolutional Neural Network
Zhicheng Cai
Chenglei Peng
Qiu Shen
16
0
0
18 Jun 2023
Point-Cloud Completion with Pretrained Text-to-image Diffusion Models
Yoni Kasten
Ohad Rahamim
Gal Chechik
30
24
0
18 Jun 2023
A semantically enhanced dual encoder for aspect sentiment triplet extraction
Baoxing Jiang
Shehui Liang
Peiyu Liu
Kaifang Dong
Hongye Li
23
15
0
14 Jun 2023
SequenceMatch: Imitation Learning for Autoregressive Sequence Modelling with Backtracking
Chris Cundy
Stefano Ermon
16
10
0
08 Jun 2023
Policy-Based Self-Competition for Planning Problems
Jonathan Pirnay
Q. Göttl
Jakob Burger
D. G. Grimm
34
3
0
07 Jun 2023
Cross-LKTCN: Modern Convolution Utilizing Cross-Variable Dependency for Multivariate Time Series Forecasting Dependency for Multivariate Time Series Forecasting
Donghao Luo
Xue Wang
BDL
AI4TS
19
2
0
04 Jun 2023
DaTaSeg: Taming a Universal Multi-Dataset Multi-Task Segmentation Model
Xiuye Gu
Huayu Chen
Jonathan Huang
Abdullah M. Rashwan
Boxin Wang
...
Golnaz Ghiasi
Weicheng Kuo
Huizhong Chen
Liang-Chieh Chen
David A. Ross
ISeg
28
26
0
02 Jun 2023
Generalist Equivariant Transformer Towards 3D Molecular Interaction Learning
Xiangzhe Kong
Wen-bing Huang
Yang Liu
22
13
0
02 Jun 2023
A Transformer-based representation-learning model with unified processing of multimodal input for clinical diagnostics
Hong-Yu Zhou
Yizhou Yu
Chengdi Wang
Shu Zhen Zhang
Yuanxu Gao
Jia-Yu Pan
Jun Shao
Guangming Lu
Kang Zhang
Weimin Li
MedIm
19
150
0
01 Jun 2023
Fast Dynamic 1D Simulation of Divertor Plasmas with Neural PDE Surrogates
Y. Poels
G. Derks
E. Westerhof
Koen Minartz
Sven Wiesen
Vlado Menkovski
3DGS
AI4CE
19
16
0
30 May 2023
Prediction Error-based Classification for Class-Incremental Learning
Michal Zajkac
Tinne Tuytelaars
Gido M. van de Ven
CLL
28
8
0
30 May 2023
Improving Generalization for Multimodal Fake News Detection
Sahar Tahmasebi
Sherzod Hakimov
Ralph Ewerth
Eric Müller-Budack
20
5
0
29 May 2023
Explicit Visual Prompting for Universal Foreground Segmentations
Weihuang Liu
Xi Shen
Chi-Man Pun
Xiaodong Cun
VPVLM
VLM
38
14
0
29 May 2023
A Neural State-Space Model Approach to Efficient Speech Separation
Chen Chen
Chao-Han Huck Yang
Kai Li
Yuchen Hu
Pin-Jui Ku
Chng Eng Siong
34
11
0
26 May 2023
VIP5: Towards Multimodal Foundation Models for Recommendation
Shijie Geng
Juntao Tan
Shuchang Liu
Zuohui Fu
Yongfeng Zhang
29
69
0
23 May 2023
EfficientSpeech: An On-Device Text to Speech Model
Rowel Atienza
31
4
0
23 May 2023
U-TILISE: A Sequence-to-sequence Model for Cloud Removal in Optical Satellite Time Series
Corinne Stucker
Vivien Sainte Fare Garnot
Konrad Schindler
AI4TS
24
13
0
22 May 2023
AudioToken: Adaptation of Text-Conditioned Diffusion Models for Audio-to-Image Generation
Guy Yariv
Itai Gat
Lior Wolf
Yossi Adi
Idan Schwartz
DiffM
20
20
0
22 May 2023
Curve Your Enthusiasm: Concurvity Regularization in Differentiable Generalized Additive Models
Julien N. Siems
Konstantin Ditschuneit
Winfried Ripken
Alma Lindborg
Maximilian Schambach
Johannes Otterbach
Martin Genzel
19
6
0
19 May 2023
Boost Vision Transformer with GPU-Friendly Sparsity and Quantization
Chong Yu
Tao Chen
Zhongxue Gan
Jiayuan Fan
MQ
ViT
30
23
0
18 May 2023
Token-wise Decomposition of Autoregressive Language Model Hidden States for Analyzing Model Predictions
Byung-Doh Oh
William Schuler
29
2
0
17 May 2023
Multi-Level Global Context Cross Consistency Model for Semi-Supervised Ultrasound Image Segmentation with Diffusion Model
Fenghe Tang
Jianrui Ding
Lingtao Wang
Min Xian
C. Ning
DiffM
MedIm
34
12
0
16 May 2023
Evaluation of self-supervised pre-training for automatic infant movement classification using wearable movement sensors
Einari Vaaras
Manu Airaksinen
S. Vanhatalo
Okko Rasanen
25
4
0
16 May 2023
Toward Moiré-Free and Detail-Preserving Demosaicking
Xuan-Yi Li
Y. Niu
Bo-Lu Zhao
Haoyuan Shi
Zitong An
31
1
0
15 May 2023
MaxViT-UNet: Multi-Axis Attention for Medical Image Segmentation
Abdul Rehman Khan
Asifullah Khan
ViT
MedIm
41
14
0
15 May 2023
A Multidimensional Graph Fourier Transformation Neural Network for Vehicle Trajectory Prediction
Marion Neumeier
Andreas Tollkühn
M. Botsch
Wolfgang Utschick
22
5
0
12 May 2023
Multitask learning in Audio Captioning: a sentence embedding regression loss acts as a regularizer
Etienne Labbé
J. Pinquier
Thomas Pellegrini
48
5
0
02 May 2023
Previous
1
2
3
...
5
6
7
...
15
16
17
Next