Gaussian Error Linear Units (GELUs)

27 June 2016

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 945 papers shown

Title
Improved Feature Distillation via Projector Ensemble Yudong Chen Sen Wang Jiajun Liu Xuwei Xu Frank de Hoog Zi Huang 39 37 0 27 Oct 2022
M $^3$ ViT: Mixture-of-Experts Vision Transformer for Efficient Multi-task Learning with Model-Accelerator Co-design Hanxue Liang Zhiwen Fan Rishov Sarkar Ziyu Jiang Tianlong Chen Kai Zou Yu Cheng Cong Hao Zhangyang Wang MoE 42 81 0 26 Oct 2022
PredNAS: A Universal and Sample Efficient Neural Architecture Search Framework Liuchun Yuan Zehao Huang Naiyan Wang 29 0 0 26 Oct 2022
Clinically-Inspired Multi-Agent Transformers for Disease Trajectory Forecasting from Multimodal Data Huy Hoang Nguyen Matthew B. Blaschko S. Saarakkala A. Tiulpin MedIm AI4CE 50 15 0 25 Oct 2022
MetaFormer Baselines for Vision Weihao Yu Chenyang Si Pan Zhou Mi Luo Yichen Zhou Jiashi Feng Shuicheng Yan Xinchao Wang MoE 40 158 0 24 Oct 2022
A Continuous Convolutional Trainable Filter for Modelling Unstructured Data Dario Coscia L. Meneghetti N. Demo G. Stabile G. Rozza 24 8 0 24 Oct 2022
CMU-Net: A Strong ConvMixer-based Medical Ultrasound Image Segmentation Network Fenghe Tang Lingtao Wang C. Ning Min Xian Jianrui Ding 30 60 0 24 Oct 2022
Compressing multidimensional weather and climate data into neural networks La-mei Huang Torsten Hoefler AI4CE 49 31 0 22 Oct 2022
Stochastic Adaptive Activation Function Kyungsu Lee Jaeseung Yang Haeyun Lee J. Y. Hwang 30 3 0 21 Oct 2022
Graphically Structured Diffusion Models Christian D. Weilbach William Harvey Frank Wood DiffM 40 7 0 20 Oct 2022
Coordinates Are NOT Lonely -- Codebook Prior Helps Implicit Neural 3D Representations Fukun Yin Wen Liu Zilong Huang Pei Cheng Tao Chen Gang Yu 22 19 0 20 Oct 2022
Dense but Efficient VideoQA for Intricate Compositional Reasoning Jihyeon Janel Lee Wooyoung Kang Eun-Sol Kim CoGe 24 3 0 19 Oct 2022
Nish: A Novel Negative Stimulated Hybrid Activation Function Yildiray Anagün Ş. Işık 27 2 0 17 Oct 2022
Scratching Visual Transformer's Back with Uniform Attention Nam Hyeon-Woo Kim Yu-Ji Byeongho Heo Doonyoon Han Seong Joon Oh Tae-Hyun Oh 366 23 0 16 Oct 2022
Hierarchical Approach for Joint Semantic, Plant Instance, and Leaf Instance Segmentation in the Agricultural Domain Gianmarco Roggiolani Matteo Sodano Tiziano Guadagnino Federico Magistri Jens Behley C. Stachniss 17 23 0 14 Oct 2022
Experiments on Turkish ASR with Self-Supervised Speech Representation Learning Ali Safaya E. Erzin 21 1 0 13 Oct 2022
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations Róbert Csordás Kazuki Irie Jürgen Schmidhuber NAI 19 12 0 12 Oct 2022
Bridging the Gap Between Vision Transformers and Convolutional Neural Networks on Small Datasets Zhiying Lu Hongtao Xie Chuanbin Liu Yongdong Zhang ViT 28 57 0 12 Oct 2022
Investigating the Failure Modes of the AUC metric and Exploring Alternatives for Evaluating Systems in Safety Critical Applications Swaroop Mishra Anjana Arunkumar Chitta Baral 33 0 0 10 Oct 2022
Coded Residual Transform for Generalizable Deep Metric Learning Shichao Kan Yixiong Liang Min Li Yigang Cen Jianxin Wang Z. He 36 3 0 09 Oct 2022
A Transformer-based deep neural network model for SSVEP classification Jianbo Chen Yangsong Zhang Yudong Pan Peng Xu Cuntai Guan 22 50 0 09 Oct 2022
Time-Space Transformers for Video Panoptic Segmentation Andra Petrovai S. Nedevschi ViT 27 3 0 07 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model Aohan Zeng Xiao Liu Zhengxiao Du Zihan Wang Hanyu Lai ... Jidong Zhai Wenguang Chen Peng Zhang Yuxiao Dong Jie Tang BDL LRM 275 1,077 0 05 Oct 2022
Granularity-aware Adaptation for Image Retrieval over Multiple Tasks Jon Almazán ByungSoo Ko Geonmo Gu Diane Larlus Yannis Kalantidis ObjD VLM 48 7 0 05 Oct 2022
Robust Fair Clustering: A Novel Fairness Attack and Defense Framework Anshuman Chhabra Peizhao Li P. Mohapatra Hongfu Liu OOD 34 22 0 04 Oct 2022
MOAT: Alternating Mobile Convolution and Attention Brings Strong Vision Models Chenglin Yang Siyuan Qiao Qihang Yu Xiaoding Yuan Yukun Zhu Alan Yuille Hartwig Adam Liang-Chieh Chen ViT MoE 41 60 0 04 Oct 2022
The Effectiveness of Masked Language Modeling and Adapters for Factual Knowledge Injection Sondre Wold KELM 39 4 0 03 Oct 2022
CRISP: Curriculum based Sequential Neural Decoders for Polar Code Family Ashwin Hebbar Viraj Nadkarni Ashok Vardhan Makkuva S. Bhat Sewoong Oh Pramod Viswanath 30 6 0 01 Oct 2022
E-Branchformer: Branchformer with Enhanced merging for speech recognition Kwangyoun Kim Felix Wu Yifan Peng Jing Pan Prashant Sridhar Kyu Jeong Han Shinji Watanabe 61 105 0 30 Sep 2022
BayesFT: Bayesian Optimization for Fault Tolerant Neural Network Architecture Nanyang Ye Jingbiao Mei Zhicheng Fang Yuwen Zhang Ziqing Zhang Huaying Wu Xiaoyao Liang OOD 33 5 0 30 Sep 2022
Towards Multi-spatiotemporal-scale Generalized PDE Modeling Jayesh K. Gupta Johannes Brandstetter AI4CE 61 120 0 30 Sep 2022
Protein structure generation via folding diffusion Kevin E. Wu Kevin Kaichuang Yang Rianne van den Berg James Zou Alex X. Lu Ava P. Amini DiffM 35 193 0 30 Sep 2022
State-specific protein-ligand complex structure prediction with a multi-scale deep generative model Zhuoran Qiao Weili Nie Arash Vahdat Thomas F. Miller Anima Anandkumar DiffM 39 84 0 30 Sep 2022
DreamFusion: Text-to-3D using 2D Diffusion Ben Poole Ajay Jain Jonathan T. Barron B. Mildenhall 85 2,323 0 29 Sep 2022
Continuous PDE Dynamics Forecasting with Implicit Neural Representations Yuan Yin Matthieu Kirchmeyer Jean-Yves Franceschi A. Rakotomamonjy Patrick Gallinari AI4CE 25 49 0 29 Sep 2022
Transfer Learning with Pretrained Remote Sensing Transformers A. Fuller K. Millard J.R. Green 35 11 0 28 Sep 2022
Evolution TANN and the identification of internal variables and evolution equations in solid mechanics Filippo Masi I. Stefanou AI4CE 31 30 0 27 Sep 2022
Rethinking Performance Gains in Image Dehazing Networks Yuda Song Yang Zhou Hui Qian Xin Du SSeg 36 48 0 23 Sep 2022
Lightweight Transformers for Human Activity Recognition on Mobile Devices Sannara Ek François Portet P. Lalanda 37 28 0 22 Sep 2022
DFX: A Low-latency Multi-FPGA Appliance for Accelerating Transformer-based Text Generation Seongmin Hong Seungjae Moon Junsoo Kim Sungjae Lee Minsub Kim Dongsoo Lee Joo-Young Kim 72 77 0 22 Sep 2022
An Efficient End-to-End Transformer with Progressive Tri-modal Attention for Multi-modal Emotion Recognition Yang Wu Pai Peng Zhenyu Zhang Yanyan Zhao Bing Qin 32 1 0 20 Sep 2022
LogGD:Detecting Anomalies from System Logs by Graph Neural Networks Yongzhen Xie Hongyu Zhang M. Babar AI4TS 23 20 0 16 Sep 2022
A Light Recipe to Train Robust Vision Transformers Edoardo Debenedetti Vikash Sehwag Prateek Mittal ViT 32 69 0 15 Sep 2022
Gromov-Wasserstein Autoencoders Nao Nakagawa Ren Togo Takahiro Ogawa Miki Haseyama GAN DRL 26 11 0 15 Sep 2022
On the interplay of adversarial robustness and architecture components: patches, convolution and attention Francesco Croce Matthias Hein 43 6 0 14 Sep 2022
Characterizing Graph Datasets for Node Classification: Homophily-Heterophily Dichotomy and Beyond Oleg Platonov Denis Kuznedelev Artem Babenko Liudmila Prokhorenkova 59 37 0 13 Sep 2022
On the Factory Floor: ML Engineering for Industrial-Scale Ads Recommendation Models Rohan Anil S. Gadanho Danya Huang Nijith Jacob Zhuoshu Li ... Cristina Pop Kevin Regan G. Shamir Rakesh Shivanna Qiqi Yan 3DV 29 41 0 12 Sep 2022
Spach Transformer: Spatial and Channel-wise Transformer Based on Local and Global Self-attentions for PET Image Denoising Se-In Jang T. Pan Ye Li P. Heidari Junyu Chen Quanzheng Li Kuang Gong ViT MedIm 36 27 0 07 Sep 2022
Bag of Tricks for FGSM Adversarial Training Zichao Li Li Liu Zeyu Wang Yuyin Zhou Cihang Xie AAML 35 6 0 06 Sep 2022
How important are activation functions in regression and classification? A survey, performance comparison, and future directions Ameya Dilip Jagtap George Karniadakis AI4CE 37 71 0 06 Sep 2022