Gaussian Error Linear Units (GELUs)

27 June 2016

Papers citing "Gaussian Error Linear Units (GELUs)"

50 / 840 papers shown

Title
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers James Gunn Zygmunt Lenyk Anuj Sharma Andrea Donati Alexandru Buburuzan John Redford Romain Mueller MDE 38 8 0 22 Dec 2023
SCEdit: Efficient and Controllable Image Diffusion Generation via Skip Connection Editing Zeyinzi Jiang Chaojie Mao Yulin Pan Zhen Han Jingfeng Zhang 29 28 0 18 Dec 2023
Guided Image Restoration via Simultaneous Feature and Image Guided Fusion Xinyi Liu Qian Zhao Jie-Kai Liang Huiyu Zeng Deyu Meng Lei Zhang 37 0 0 14 Dec 2023
4M: Massively Multimodal Masked Modeling David Mizrahi Roman Bachmann Ouguzhan Fatih Kar Teresa Yeo Mingfei Gao Afshin Dehghan Amir Zamir MLLM 50 63 0 11 Dec 2023
DreamVideo: Composing Your Dream Videos with Customized Subject and Motion Yujie Wei Shiwei Zhang Zhiwu Qing Hangjie Yuan Zhiheng Liu Yu Liu Yingya Zhang Jingren Zhou Hongming Shan DiffM VGen 17 89 0 07 Dec 2023
Defense Against Adversarial Attacks using Convolutional Auto-Encoders Shreyasi Mandal AAML 23 1 0 06 Dec 2023
C3: High-performance and low-complexity neural compression from a single image or video Hyunjik Kim Matthias Bauer Lucas Theis Jonathan Richard Schwarz Emilien Dupont VGen 22 23 0 05 Dec 2023
Analyzing and Improving the Training Dynamics of Diffusion Models Tero Karras M. Aittala J. Lehtinen Janne Hellsten Timo Aila S. Laine 42 155 0 05 Dec 2023
HUGS: Human Gaussian Splats Muhammed Kocabas Jen-Hao Rick Chang J. Gabriel Oncel Tuzel Anurag Ranjan 3DGS 42 91 0 29 Nov 2023
Improving Feature Stability during Upsampling -- Spectral Artifacts and the Importance of Spatial Context Shashank Agnihotri Julia Grabinski M. Keuper 30 6 0 29 Nov 2023
End-to-End Temporal Action Detection with 1B Parameters Across 1000 Frames Shuming Liu Chen-Da Liu-Zhang Chen Zhao Guohao Li 33 25 0 28 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 36 6 0 21 Nov 2023
Deep Learning-Based Real-Time Quality Control of Standard Video Compression for Live Streaming Matin Mortaheb M. A. Khojastepour S. Chakradhar S. Ulukus 13 1 0 21 Nov 2023
GRAM: An Interpretable Approach for Graph Anomaly Detection using Gradient Attention Maps Yifei Yang Peng Wang Xiaofan He Dongmian Zou 14 5 0 10 Nov 2023
Towards a Unified Framework of Contrastive Learning for Disentangled Representations Stefan Matthes Zhiwei Han Hao Shen 34 4 0 08 Nov 2023
OmniVec: Learning robust representations with cross modal sharing Siddharth Srivastava Gaurav Sharma SSL 29 64 0 07 Nov 2023
Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion Lunjun Zhang Yuwen Xiong Ze Yang Sergio Casas Rui Hu R. Urtasun 41 50 0 02 Nov 2023
End-to-End Single-Channel Speaker-Turn Aware Conversational Speech Translation Juan Pablo Zuluaga Zhaocheng Huang Xing Niu Rohit Paturi S. Srinivasan Prashant Mathur Brian Thompson Marcello Federico BDL 35 2 0 01 Nov 2023
Learn to Categorize or Categorize to Learn? Self-Coding for Generalized Category Discovery Sarah Rastegar Hazel Doughty Cees G. M. Snoek 33 15 0 30 Oct 2023
Video Frame Interpolation with Many-to-many Splatting and Spatial Selective Refinement Ping Hu Simon Niklaus Lu Zhang Stan Sclaroff Kate Saenko 25 6 0 29 Oct 2023
TorchDEQ: A Library for Deep Equilibrium Models Zhengyang Geng J. Zico Kolter VLM 56 12 0 28 Oct 2023
Understanding the Effects of Projectors in Knowledge Distillation Yudong Chen Sen Wang Jiajun Liu Xuwei Xu Frank de Hoog Brano Kusy Zi Huang 26 0 0 26 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps Sidi Wu Yizi Chen Konrad Schindler L. Hurni 26 2 0 19 Oct 2023
From Alexnet to Transformers: Measuring the Non-linearity of Deep Neural Networks with Affine Optimal Transport Quentin Bouniot I. Redko Anton Mallasto Charlotte Laclau Karol Arndt Oliver Struckmeier Markus Heinonen Ville Kyrki Samuel Kaski 58 2 0 17 Oct 2023
A Non-monotonic Smooth Activation Function Koushik Biswas Meghana Karri Ulacs Baugci 16 1 0 16 Oct 2023
SeUNet-Trans: A Simple yet Effective UNet-Transformer Model for Medical Image Segmentation Tan-Hanh Pham Xianqi Li Kim-Doang Nguyen MedIm ViT 26 8 0 16 Oct 2023
Homophone Disambiguation Reveals Patterns of Context Mixing in Speech Transformers Hosein Mohebbi Grzegorz Chrupała Willem H. Zuidema A. Alishahi 36 12 0 15 Oct 2023
Temporally Aligning Long Audio Interviews with Questions: A Case Study in Multimodal Data Integration Piyush Singh Pasi Karthikeya Battepati P. Jyothi Ganesh Ramakrishnan T. Mahapatra Manoj Singh 51 0 0 10 Oct 2023
Understanding the Feature Norm for Out-of-Distribution Detection Jaewoo Park Jacky Chen Long Chai Jaeho Yoon Andrew Beng Jin Teoh OODD 24 12 0 09 Oct 2023
Low-Resolution Self-Attention for Semantic Segmentation Yu-Huan Wu Shi-Chen Zhang Yun-Hai Liu Le Zhang Xin Zhan Daquan Zhou Jiashi Feng Ming-Ming Cheng Liangli Zhen ViT 45 3 0 08 Oct 2023
Deep Learning Based Uplink Multi-User SIMO Beamforming Design Cemil Vahapoglu Tim O'Shea Tamoghna Roy S. Ulukus 26 7 0 28 Sep 2023
Deep Learning-Based Real-Time Rate Control for Live Streaming on Wireless Networks Matin Mortaheb M. A. Khojastepour S. Chakradhar S. Ulukus 13 0 0 27 Sep 2023
Rethinking Session Variability: Leveraging Session Embeddings for Session Robustness in Speaker Verification Hee-Soo Heo Ki-hyun Nam Bong-Jin Lee Youngki Kwon Min-Ji Lee You Jin Kim Joon Son Chung 26 1 0 26 Sep 2023
Introducing DictaLM -- A Large Generative Language Model for Modern Hebrew Shaltiel Shmidman Avi Shmidman Amir DN Cohen Moshe Koppel 27 0 0 25 Sep 2023
Small-scale proxies for large-scale Transformer training instabilities Mitchell Wortsman Peter J. Liu Lechao Xiao Katie Everett A. Alemi ... Jascha Narain Sohl-Dickstein Kelvin Xu Jaehoon Lee Justin Gilmer Simon Kornblith 35 81 0 25 Sep 2023
On the Posterior Distribution in Denoising: Application to Uncertainty Quantification Hila Manor T. Michaeli UQCV 23 17 0 24 Sep 2023
Large-scale Pretraining Improves Sample Efficiency of Active Learning based Molecule Virtual Screening Zhonglin Cao Simone Sciabola Ye Wang 35 1 0 20 Sep 2023
PDPCRN: Parallel Dual-Path CRN with Bi-directional Inter-Branch Interactions for Multi-Channel Speech Enhancement Jia-Yu Pan Shulin He Tianci Wu Hui Zhang Xueliang Zhang 19 0 0 19 Sep 2023
Limited-Angle Tomography Reconstruction via Deep End-To-End Learning on Synthetic Data Thomas Germer Jan Robine S. Konietzny Stefan Harmeling Tobias Uelwer MedIm 23 5 0 13 Sep 2023
Advancing Parsimonious Deep Learning Weather Prediction using the HEALPix Mesh Matthias Karlbauer Nathaniel Cresswell-Clay Dale Durran Raul A Moreno Thorsten Kurth Boris Bonev Noah D. Brenowitz Martin Volker Butz MDE 28 20 0 11 Sep 2023
ImageBind-LLM: Multi-modality Instruction Tuning Jiaming Han Renrui Zhang Wenqi Shao Peng Gao Peng-Tao Xu ... Yafei Wen Xiaoxin Chen Xiangyu Yue Hongsheng Li Yu Qiao MLLM 49 116 0 07 Sep 2023
3D Transformer based on deformable patch location for differential diagnosis between Alzheimer's disease and Frontotemporal dementia H. Nguyen Michael Clement Boris Mansencal Pierrick Coupé MedIm 31 0 0 06 Sep 2023
Character Queries: A Transformer-based Approach to On-Line Handwritten Character Segmentation Michael Jungo Beat Wolf Andrii Maksai C. Musat Andreas Fischer 27 2 0 06 Sep 2023
A Unified Masked Autoencoder with Patchified Skeletons for Motion Synthesis Esteve Valls Mascaro Hyemin Ahn Dongheui Lee CVBM 37 4 0 14 Aug 2023
Large-kernel Attention for Efficient and Robust Brain Lesion Segmentation Liam Chalcroft Ruben Lourencco Pereira Mikael Brudfors Andrew S. Kayser M. D’Esposito Cathy J. Price Ioannis Pappas John Ashburner ViT 3DV MedIm 29 8 0 14 Aug 2023
Composable Function-preserving Expansions for Transformer Architectures Andrea Gesmundo Kaitlin Maile AI4CE 40 8 0 11 Aug 2023
Graph Embedding Dynamic Feature-based Supervised Contrastive Learning of Transient Stability for Changing Power Grid Topologies Zijian Lv Xinyu Chen Zijian Feng 22 0 0 01 Aug 2023
Generative Models as a Complex Systems Science: How can we make sense of large language model behavior? Ari Holtzman Peter West Luke Zettlemoyer AI4CE 30 14 0 31 Jul 2023
Efficient Federated Learning via Local Adaptive Amended Optimizer with Linear Speedup Yan Sun Li Shen Hao Sun Liang Ding Dacheng Tao FedML 24 17 0 30 Jul 2023
BARTPhoBEiT: Pre-trained Sequence-to-Sequence and Image Transformers Models for Vietnamese Visual Question Answering Khiem Vinh Tran Kiet Van Nguyen Ngan Luu-Thuy Nguyen ViT 31 2 0 28 Jul 2023