On Layer Normalization in the Transformer Architecture

12 February 2020

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown

Title
Attending to Topological Spaces: The Cellular Transformer Rubén Ballester Pablo Hernández-García Mathilde Papillon Claudio Battiloro Nina Miolane Tolga Birdal Carles Casacuberta Sergio Escalera Mustafa Hajij 43 4 0 23 May 2024
Challenging Gradient Boosted Decision Trees with Tabular Transformers for Fraud Detection at Booking.com Sergei Krutikov Bulat Khaertdinov Rodion Kiriukhin Shubham Agrawal Kees Jan de Vries LMTD 48 0 0 22 May 2024
A Dual Power Grid Cascading Failure Model for the Vulnerability Analysis Tianxin Zhou Xiang Li Haibing Lu 28 0 0 18 May 2024
Beyond Scaling Laws: Understanding Transformer Performance with Associative Memory Xueyan Niu Bo Bai Lei Deng Wei Han 44 6 0 14 May 2024
Geometry and Dynamics of LayerNorm P. Riechers 19 1 0 07 May 2024
Learning Linear Block Error Correction Codes Yoni Choukroun Lior Wolf 31 6 0 07 May 2024
Position: Understanding LLMs Requires More Than Statistical Generalization Patrik Reizinger Szilvia Ujváry Anna Mészáros A. Kerekes Wieland Brendel Ferenc Huszár 36 12 0 03 May 2024
Nyonic Technical Report Junfeng Tian Rui-cang Wang Cong Li Yudong Zhou Jun Liu Jun Wang 41 0 0 24 Apr 2024
TransformerFAM: Feedback attention is working memory Dongseong Hwang Weiran Wang Zhuoyuan Huo K. Sim P. M. Mengibar 40 12 0 14 Apr 2024
Megalodon: Efficient LLM Pretraining and Inference with Unlimited Context Length Xuezhe Ma Xiaomeng Yang Wenhan Xiong Beidi Chen Lili Yu Hao Zhang Jonathan May Luke Zettlemoyer Omer Levy Chunting Zhou 53 27 0 12 Apr 2024
Generating Synthetic Time Series Data for Cyber-Physical Systems Alexander Sommers Somayeh Bakhtiari Ramezani Logan Cummins Sudip Mittal Shahram Rahimi Maria Seale Joseph Jaboure AI4TS 48 0 0 12 Apr 2024
Shortcut-connected Expert Parallelism for Accelerating Mixture-of-Experts Weilin Cai Juyong Jiang Le Qin Junwei Cui Sunghun Kim Jiayi Huang 62 7 0 07 Apr 2024
Exploring the Efficacy of Group-Normalization in Deep Learning Models for Alzheimer's Disease Classification Gousia Habib Ishfaq Ahmed Malik Jameel Ahmad Imtiaz Ahmed Shaima Qureshi 36 0 0 01 Apr 2024
LayerNorm: A key component in parameter-efficient fine-tuning Taha ValizadehAslani Hualou Liang 51 1 0 29 Mar 2024
Word Order's Impacts: Insights from Reordering and Generation Analysis Qinghua Zhao Jiaang Li Lei Li Zenghui Zhou Junfeng Liu 38 0 0 18 Mar 2024
Simple and Scalable Strategies to Continually Pre-train Large Language Models Adam Ibrahim Benjamin Thérien Kshitij Gupta Mats L. Richter Quentin Anthony Timothée Lesort Eugene Belilovsky Irina Rish KELM CLL 44 54 0 13 Mar 2024
Structural Positional Encoding for knowledge integration in transformer-based medical process monitoring Christopher Irwin Marco Dossena G. Leonardi Stefania Montani MedIm 38 0 0 13 Mar 2024
A Survey of Vision Transformers in Autonomous Driving: Current Trends and Future Directions Quoc-Vinh Lai-Dang ViT 36 2 0 12 Mar 2024
Tractable Joint Prediction and Planning over Discrete Behavior Modes for Urban Driving Adam R. Villaflor Brian Yang Huangyuan Su Katerina Fragkiadaki John M. Dolan Jeff Schneider 59 0 0 12 Mar 2024
Transformer for Times Series: an Application to the S&P500 Pierre Brugiere G. Turinici AI4TS AIFin 18 4 0 04 Mar 2024
ConSep: a Noise- and Reverberation-Robust Speech Separation Framework by Magnitude Conditioning Kuan-Hsun Ho J. Hung Berlin Chen 42 0 0 04 Mar 2024
Revisiting Dynamic Evaluation: Online Adaptation for Large Language Models Amal Rannen-Triki J. Bornschein Razvan Pascanu Marcus Hutter Andras Gyorgy Alexandre Galashov Yee Whye Teh Michalis K. Titsias KELM 28 1 0 03 Mar 2024
EfficientZero V2: Mastering Discrete and Continuous Control with Limited Data Shengjie Wang Shaohuai Liu Weirui Ye Jiacheng You Yang Gao OffRL 29 13 0 01 Mar 2024
Griffin: Mixing Gated Linear Recurrences with Local Attention for Efficient Language Models Soham De Samuel L. Smith Anushan Fernando Aleksandar Botev George-Christian Muraru ... David Budden Yee Whye Teh Razvan Pascanu Nando de Freitas Çağlar Gülçehre Mamba 61 117 0 29 Feb 2024
RiNALMo: General-Purpose RNA Language Models Can Generalize Well on Structure Prediction Tasks Rafael Josip Penić Tin Vlasic Roland G. Huber Yue Wan M. Šikić AI4CE 24 27 0 29 Feb 2024
Towards Optimal Learning of Language Models Yuxian Gu Li Dong Y. Hao Qingxiu Dong Minlie Huang Furu Wei 39 7 0 27 Feb 2024
Actions Speak Louder than Words: Trillion-Parameter Sequential Transducers for Generative Recommendations Jiaqi Zhai Lucy Liao Xing Liu Yueming Wang Rui Li ... Zhaojie Gong Fangda Gu Michael He Yin-Hua Lu Yu Shi OffRL 32 48 0 27 Feb 2024
Why Transformers Need Adam: A Hessian Perspective Yushun Zhang Congliang Chen Tian Ding Ziniu Li Ruoyu Sun Zhimin Luo 40 43 0 26 Feb 2024
Detecting Machine-Generated Texts by Multi-Population Aware Optimization for Maximum Mean Discrepancy Shuhai Zhang Yiliao Song Jiahao Yang Yuanqing Li Bo Han Mingkui Tan DeLMO 39 5 0 25 Feb 2024
Transformers are Expressive, But Are They Expressive Enough for Regression? Swaroop Nath H. Khadilkar Pushpak Bhattacharyya 34 3 0 23 Feb 2024
Transformer tricks: Precomputing the first layer Nils Graef MoE 32 4 0 20 Feb 2024
Chain of Thought Empowers Transformers to Solve Inherently Serial Problems Zhiyuan Li Hong Liu Denny Zhou Tengyu Ma LRM AI4CE 30 101 0 20 Feb 2024
Any2Graph: Deep End-To-End Supervised Graph Prediction With An Optimal Transport Loss Paul Krzakala Junjie Yang Rémi Flamary Florence dÁlché-Buc Charlotte Laclau Matthieu Labeau OT 34 1 0 19 Feb 2024
Synthetic location trajectory generation using categorical diffusion models Simon Dirmeier Ye Hong Fernando Pérez-Cruz 29 0 0 19 Feb 2024
A novel molecule generative model of VAE combined with Transformer for unseen structure generation Yasuhiro Yoshikai T. Mizuno Shumpei Nemoto Hiroyuki Kusuhara 33 3 0 19 Feb 2024
Pushing the Limits of Zero-shot End-to-End Speech Translation Ioannis Tsiamas Gerard I. Gállego José A. R. Fonollosa Marta R. Costa-jussá 43 7 0 16 Feb 2024
Bridging Associative Memory and Probabilistic Modeling Rylan Schaeffer Nika Zahedi Mikail Khona Dhruv Pai Sang T. Truong ... Sarthak Chandra Andres Carranza Ila Rani Fiete Andrey Gromov Oluwasanmi Koyejo DiffM 48 4 0 15 Feb 2024
Graph Structure Inference with BAM: Introducing the Bilinear Attention Mechanism Philipp Froehlich Heinz Koeppl GNN 29 1 0 12 Feb 2024
Unified Training of Universal Time Series Forecasting Transformers Gerald Woo Chenghao Liu Akshat Kumar Caiming Xiong Silvio Savarese Doyen Sahoo AI4TS 120 170 0 04 Feb 2024
DeepLag: Discovering Deep Lagrangian Dynamics for Intuitive Fluid Prediction Qilong Ma Haixu Wu Lanxiang Xing Jianmin Wang Mingsheng Long AI4CE 34 0 0 04 Feb 2024
Self-attention Networks Localize When QK-eigenspectrum Concentrates Han Bao Ryuichiro Hataya Ryo Karakida 18 5 0 03 Feb 2024
BPDec: Unveiling the Potential of Masked Language Modeling Decoder in BERT pretraining Wen-Chieh Liang Youzhi Liang OffRL 30 2 0 29 Jan 2024
FedGT: Federated Node Classification with Scalable Graph Transformer Zaixin Zhang Qingyong Hu Yang Yu Weibo Gao Qi Liu FedML 46 2 0 26 Jan 2024
Accelerating Material Property Prediction using Generically Complete Isometry Invariants Jonathan Balasingham Viktor Zamaraev V. Kurlin 16 5 0 22 Jan 2024
FourCastNeXt: Optimizing FourCastNet Training for Limited Compute Edison Guo Maruf Ahmed Yue Sun Rui Yang Harrison Cook Tennessee Leeuwenburg Ben Evans 26 1 0 10 Jan 2024
Unsupervised Salient Patch Selection for Data-Efficient Reinforcement Learning Zhaohui Jiang Paul Weng OffRL 27 0 0 10 Jan 2024
Setting the Record Straight on Transformer Oversmoothing G. Dovonon M. Bronstein Matt J. Kusner 35 5 0 09 Jan 2024
Spike No More: Stabilizing the Pre-training of Large Language Models Sho Takase Shun Kiyono Sosuke Kobayashi Jun Suzuki 20 14 0 28 Dec 2023
Understanding the Potential of FPGA-Based Spatial Acceleration for Large Language Model Inference Hongzheng Chen Jiahao Zhang Yixiao Du Shaojie Xiang Zichao Yue Niansong Zhang Yaohui Cai Zhiru Zhang 65 35 0 23 Dec 2023
Lift-Attend-Splat: Bird's-eye-view camera-lidar fusion using transformers James Gunn Zygmunt Lenyk Anuj Sharma Andrea Donati Alexandru Buburuzan John Redford Romain Mueller MDE 38 8 0 22 Dec 2023