On Layer Normalization in the Transformer Architecture

12 February 2020

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown

Title
How Smooth Is Attention? Valérie Castin Pierre Ablin Gabriel Peyré AAML 40 9 0 22 Dec 2023
BrainTalker: Low-Resource Brain-to-Speech Synthesis with Transfer Learning using Wav2Vec 2.0 Miseul Kim Zhenyu Piao Jihyun Lee Hong-Goo Kang 71 3 0 21 Dec 2023
Learning Flexible Body Collision Dynamics with Hierarchical Contact Mesh Transformer Youn-Yeol Yu Jeongwhan Choi Woojin Cho Kookjin Lee Nayong Kim ... Ilho Kim Seok-Woo Lee Joon Young Yang S. Yoon Noseong Park AI4CE 23 7 0 19 Dec 2023
One-Step Diffusion Distillation via Deep Equilibrium Models Zhengyang Geng Ashwini Pokle Trevor Killeen 34 30 0 12 Dec 2023
Why "classic" Transformers are shallow and how to make them go deep Yueyao Yu Yin Zhang ViT 16 0 0 11 Dec 2023
Large-scale Training of Foundation Models for Wearable Biosignals Salar Abbaspourazad Oussama Elachqar Andrew C. Miller S. Emrani Udhyakumar Nallasamy Ian Shapiro 38 32 0 08 Dec 2023
Transformers are uninterpretable with myopic methods: a case study with bounded Dyck grammars Kaiyue Wen Yuchen Li Bing Liu Andrej Risteski 34 22 0 03 Dec 2023
MABViT -- Modified Attention Block Enhances Vision Transformers Mahesh Ramesh Aswinkumar Ramkumar 19 3 0 03 Dec 2023
Probabilistic Transformer: A Probabilistic Dependency Model for Contextual Word Representation Haoyi Wu Kewei Tu 206 3 0 26 Nov 2023
Who is leading in AI? An analysis of industry AI research Ben Cottier T. Besiroglu David Owen 36 7 0 24 Nov 2023
DACBERT: Leveraging Dependency Agreement for Cost-Efficient Bert Pretraining Martin Kuo Jianyi Zhang Yiran Chen 27 2 0 08 Nov 2023
Euclidean, Projective, Conformal: Choosing a Geometric Algebra for Equivariant Transformers P. D. Haan Taco S. Cohen Johann Brehmer 38 9 0 08 Nov 2023
Signal Processing Meets SGD: From Momentum to Filter Zhipeng Yao Guisong Chang Jiaqi Zhang Qi Zhang Dazhou Li Yu Zhang ODL 39 0 0 06 Nov 2023
Yet Another Generative Model For Room Impulse Response Estimation Sungho Lee Hyeong-Seok Choi Kyogu Lee 34 10 0 05 Nov 2023
Simplifying Transformer Blocks Bobby He Thomas Hofmann 27 31 0 03 Nov 2023
ATHENA: Mathematical Reasoning with Thought Expansion JB. Kim Hazel Kim Joonghyuk Hahn Yo-Sub Han ReLM LRM AIMat 50 7 0 02 Nov 2023
Global Transformer Architecture for Indoor Room Temperature Forecasting Alfredo V. Clemente A. Nocente Massimiliano Ruocco AI4CE 18 1 0 31 Oct 2023
TorchDEQ: A Library for Deep Equilibrium Models Zhengyang Geng J. Zico Kolter VLM 62 12 0 28 Oct 2023
ScaleLong: Towards More Stable Training of Diffusion Model via Scaling Network Long Skip Connection Zhongzhan Huang Pan Zhou Shuicheng Yan Liang Lin 24 26 0 20 Oct 2023
Real-Time Motion Prediction via Heterogeneous Polyline Transformer with Relative Pose Encoding Zhejun Zhang Alexander Liniger Daniel Gehrig Fisher Yu Luc Van Gool 66 31 0 19 Oct 2023
Eureka-Moments in Transformers: Multi-Step Tasks Reveal Softmax Induced Optimization Problems David T. Hoffmann Simon Schrodi Jelena Bratulić Nadine Behrmann Volker Fischer Thomas Brox 38 5 0 19 Oct 2023
Cross-attention Spatio-temporal Context Transformer for Semantic Segmentation of Historical Maps Sidi Wu Yizi Chen Konrad Schindler L. Hurni 31 2 0 19 Oct 2023
Enhanced Transformer Architecture for Natural Language Processing Woohyeon Moon Taeyoung Kim Bumgeun Park Dongsoo Har 30 0 0 17 Oct 2023
AMAGO: Scalable In-Context Reinforcement Learning for Adaptive Agents Jake Grigsby Linxi Fan Yuke Zhu OffRL LM&Ro 38 10 0 15 Oct 2023
LEMON: Lossless model expansion Yite Wang Jiahao Su Hanlin Lu Cong Xie Tianyi Liu Jianbo Yuan Yanghua Peng Ruoyu Sun Hongxia Yang 17 12 0 12 Oct 2023
The Expressive Power of Transformers with Chain of Thought William Merrill Ashish Sabharwal LRM AI4CE ReLM 27 0 0 11 Oct 2023
PHYDI: Initializing Parameterized Hypercomplex Neural Networks as Identity Functions Matteo Mancanelli Eleonora Grassucci A. Uncini Danilo Comminiello AI4CE 51 2 0 11 Oct 2023
CodeFuse-13B: A Pretrained Multi-lingual Code Large Language Model Peng Di Jianguo Li Hang Yu Wei Jiang Wenting Cai ... Zelin Zhao Xunjin Zheng Hailian Zhou Lifu Zhu Xianying Zhu ELM ALM AI4CE 35 12 0 10 Oct 2023
Pushing the Limits of Pre-training for Time Series Forecasting in the CloudOps Domain Gerald Woo Chenghao Liu Akshat Kumar Doyen Sahoo AI4TS AI4CE 33 13 0 08 Oct 2023
Multiple Physics Pretraining for Physical Surrogate Models Michael McCabe Bruno Régaldo-Saint Blancard Liam Parker Ruben Ohana M. Cranmer ... Francois Lanusse Mariel Pettee Tiberiu Teşileanu Kyunghyun Cho Shirley Ho PINN AI4CE 40 53 0 04 Oct 2023
Mixture of Quantized Experts (MoQE): Complementary Effect of Low-bit Quantization and Robustness Young Jin Kim Raffy Fahim Hany Awadalla MQ MoE 66 19 0 03 Oct 2023
BTR: Binary Token Representations for Efficient Retrieval Augmented Language Models Qingqing Cao Sewon Min Yizhong Wang Hannaneh Hajishirzi MQ RALM 40 4 0 02 Oct 2023
Evolutionary Neural Architecture Search for Transformer in Knowledge Tracing Shangshang Yang Xiaoshan Yu Ye Tian Xueming Yan Haiping Ma Xingyi Zhang ViT KELM AI4Ed 24 2 0 02 Oct 2023
Depthwise Hyperparameter Transfer in Residual Networks: Dynamics and Scaling Limit Blake Bordelon Lorenzo Noci Mufan Li Boris Hanin Cengiz Pehlevan 35 22 0 28 Sep 2023
Graph-level Representation Learning with Joint-Embedding Predictive Architectures Geri Skenderi Hang Li Jiliang Tang Marco Cristani AI4TS GNN 54 3 0 27 Sep 2023
On Separate Normalization in Self-supervised Transformers Xiaohui Chen Yinkai Wang Yuanqi Du S. Hassoun Liping Liu ViT 27 1 0 22 Sep 2023
A Diffusion-Model of Joint Interactive Navigation Matthew Niedoba J. Lavington Yunpeng Liu Vasileios Lioutas Justice Sefas ... Dylan Green Setareh Dabiri Berend Zwartsenberg Adam Scibior Frank Wood DiffM 24 14 0 21 Sep 2023
SkeleTR: Towrads Skeleton-based Action Recognition in the Wild Haodong Duan Mingze Xu Bing Shuai Davide Modolo Zhuowen Tu Joseph Tighe Alessandro Bergamo ViT 35 1 0 20 Sep 2023
Baichuan 2: Open Large-scale Language Models Ai Ming Yang Bin Xiao Bingning Wang Borong Zhang Ce Bian ... Youxin Jiang Yuchen Gao Yupeng Zhang Zenan Zhou Zhiying Wu ELM LRM 77 710 0 19 Sep 2023
Traveling Words: A Geometric Interpretation of Transformers Raul Molina 27 4 0 13 Sep 2023
Revisiting Energy Based Models as Policies: Ranking Noise Contrastive Estimation and Interpolating Energy Models Sumeet Singh Stephen Tu Vikas Sindhwani DiffM 20 8 0 11 Sep 2023
Enhance Multi-domain Sentiment Analysis of Review Texts through Prompting Strategies Yajing Wang Zongwei Luo LRM 19 5 0 05 Sep 2023
Learning multi-modal generative models with permutation-invariant encoders and tighter variational bounds Marcel Hirt Domenico Campolo Victoria Leong Juan-Pablo Ortega DRL 15 0 0 01 Sep 2023
Internal Cross-layer Gradients for Extending Homogeneity to Heterogeneity in Federated Learning Yun-Hin Chan Rui Zhou Running Zhao Zhihan Jiang Edith C.H. Ngai FedML 38 8 0 22 Aug 2023
Video OWL-ViT: Temporally-consistent open-world localization in video G. Heigold Matthias Minderer A. Gritsenko Alex Bewley Daniel Keysers Mario Luvcić Feng Yu Thomas Kipf VLM 24 14 0 22 Aug 2023
FineQuant: Unlocking Efficiency with Fine-Grained Weight-Only Quantization for LLMs Young Jin Kim Rawn Henry Raffy Fahim Hany Awadalla MQ 42 19 0 16 Aug 2023
Attention Is Not All You Need Anymore Zhe Chen 32 3 0 15 Aug 2023
3DMOTFormer: Graph Transformer for Online 3D Multi-Object Tracking Shuxiao Ding Eike Rehder Lukas Schneider Marius Cordts Juergen Gall 3DPC 33 17 0 12 Aug 2023
MAP: A Model-agnostic Pretraining Framework for Click-through Rate Prediction Jianghao Lin Yanru Qu Wei Guo Xinyi Dai Ruiming Tang Yong Yu Weinan Zhang 30 21 0 03 Aug 2023
From Sparse to Soft Mixtures of Experts J. Puigcerver C. Riquelme Basil Mustafa N. Houlsby MoE 121 114 0 02 Aug 2023