ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2002.04745
  4. Cited By
On Layer Normalization in the Transformer Architecture

On Layer Normalization in the Transformer Architecture

12 February 2020
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
    AI4CE
ArXivPDFHTML

Papers citing "On Layer Normalization in the Transformer Architecture"

50 / 566 papers shown
Title
Foundations of GenIR
Qingyao Ai
Jingtao Zhan
Yu Liu
51
0
0
06 Jan 2025
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation
Registering Source Tokens to Target Language Spaces in Multilingual Neural Machine Translation
Zhi Qu
Yiran Wang
Jiannan Mao
Chenchen Ding
Hideki Tanaka
Masao Utiyama
Taro Watanabe
LRM
40
0
0
06 Jan 2025
Generative Pretrained Embedding and Hierarchical Irregular Time Series
  Representation for Daily Living Activity Recognition
Generative Pretrained Embedding and Hierarchical Irregular Time Series Representation for Daily Living Activity Recognition
Damien Bouchabou
S. Nguyen
AI4TS
37
0
0
27 Dec 2024
Unity is Strength: Unifying Convolutional and Transformeral Features for
  Better Person Re-Identification
Unity is Strength: Unifying Convolutional and Transformeral Features for Better Person Re-Identification
Yuhao Wang
Pingping Zhang
Xuehu Liu
Zhengzheng Tu
Huchuan Lu
50
3
0
23 Dec 2024
Compositional Generalization Across Distributional Shifts with Sparse
  Tree Operations
Compositional Generalization Across Distributional Shifts with Sparse Tree Operations
Paul Soulos
Henry Conklin
Mattia Opper
P. Smolensky
Jianfeng Gao
Roland Fernandez
78
4
0
18 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and
  Post-LN
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
78
4
0
18 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for
  Fast, Memory Efficient, and Long Context Finetuning and Inference
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
98
82
0
18 Dec 2024
VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction
VaeDiff-DocRE: End-to-end Data Augmentation Framework for Document-level Relation Extraction
Khai Phan Tran
Wen Hua
Xue Li
SyDa
93
0
0
18 Dec 2024
Vision Transformers for Weakly-Supervised Microorganism Enumeration
Vision Transformers for Weakly-Supervised Microorganism Enumeration
Javier Ureña Santiago
Thomas Ströhle
Antonio Rodríguez-Sánchez
Ruth Breu
ViT
72
0
0
03 Dec 2024
Quark: Real-time, High-resolution, and General Neural View Synthesis
Quark: Real-time, High-resolution, and General Neural View Synthesis
John Flynn
Michael Broxton
Lukas Murmann
Lucy Chai
Matthew DuVall
...
Supreeth Achar
Kira Prabhu
Tiancheng Sun
Lynn Tsai
Ryan S. Overbeck
81
1
0
25 Nov 2024
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation
GaussianAnything: Interactive Point Cloud Flow Matching For 3D Object Generation
Yushi Lan
Shangchen Zhou
Zhaoyang Lyu
Fangzhou Hong
Shuai Yang
Bo Dai
Xingang Pan
Chen Change Loy
3DGS
55
0
0
12 Nov 2024
Enhancing Link Prediction with Fuzzy Graph Attention Networks and Dynamic Negative Sampling
Enhancing Link Prediction with Fuzzy Graph Attention Networks and Dynamic Negative Sampling
Jinming Xing
Ruilin Xing
Chang Xue
Dongwen Luo
31
2
0
12 Nov 2024
Moving Off-the-Grid: Scene-Grounded Video Representations
Moving Off-the-Grid: Scene-Grounded Video Representations
Sjoerd van Steenkiste
Daniel Zoran
Yi Yang
Yulia Rubanova
Rishabh Kabra
...
Thomas Keck
João Carreira
Alexey Dosovitskiy
Mehdi S. M. Sajjadi
Thomas Kipf
39
3
0
08 Nov 2024
PACE: Pacing Operator Learning to Accurate Optical Field Simulation for
  Complicated Photonic Devices
PACE: Pacing Operator Learning to Accurate Optical Field Simulation for Complicated Photonic Devices
Hanqing Zhu
Wenyan Cong
Guojin Chen
Shupeng Ning
Ray T. Chen
Jiaqi Gu
David Z. Pan
30
1
0
05 Nov 2024
LASER: Attention with Exponential Transformation
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
43
1
0
05 Nov 2024
Unified Speech Recognition: A Single Model for Auditory, Visual, and
  Audiovisual Inputs
Unified Speech Recognition: A Single Model for Auditory, Visual, and Audiovisual Inputs
A. Haliassos
Rodrigo Mira
Honglie Chen
Zoe Landgraf
Stavros Petridis
M. Pantic
SSL
37
6
0
04 Nov 2024
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for
  Language Models
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models
Jonas Zausinger
Lars Pennig
Kacper Chlodny
Vincent Limbach
Anna Ketteler
Thorben Prein
Vishwa Mohan Singh
Michael Morris Danziger
Jannis Born
37
1
0
04 Nov 2024
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning
  Through Retrieval and Understanding Modalities
SPECTRUM: Semantic Processing and Emotion-informed video-Captioning Through Retrieval and Understanding Modalities
Ehsan Faghihi
Mohammedreza Zarenejad
Ali-Asghar Beheshti Shirazi
47
0
0
04 Nov 2024
A Lorentz-Equivariant Transformer for All of the LHC
A Lorentz-Equivariant Transformer for All of the LHC
Johann Brehmer
Victor Bresó
P. D. Haan
Tilman Plehn
Huilin Qu
Jonas Spinner
Jesse Thaler
BDL
41
10
0
01 Nov 2024
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced
  Protein Sequence Representation
SFM-Protein: Integrative Co-evolutionary Pre-training for Advanced Protein Sequence Representation
Liang He
Peiran Jin
Yaosen Min
Shufang Xie
Lijun Wu
Tao Qin
Xiaozhuan Liang
Kaiyuan Gao
Yuliang Jiang
Tie-Yan Liu
AI4TS
50
1
0
31 Oct 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
22
3
0
31 Oct 2024
Scalable Message Passing Neural Networks: No Need for Attention in Large
  Graph Representation Learning
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz Sáez de Ocáriz Borde
Artem Lukoianov
Anastasis Kratsios
Michael M. Bronstein
Xiaowen Dong
GNN
45
1
0
29 Oct 2024
Variational inference for pile-up removal at hadron colliders with
  diffusion models
Variational inference for pile-up removal at hadron colliders with diffusion models
M. Algren
C. Pollard
J. A. Raine
T. Golling
25
0
0
29 Oct 2024
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with
  Coordinated Semantics
Beyond Color and Lines: Zero-Shot Style-Specific Image Variations with Coordinated Semantics
Jinghao Hu
Yuhe Zhang
Guohua Geng
Liuyuxin Yang
JiaRui Yan
Jingtao Cheng
YaDong Zhang
Kang Li
DiffM
43
0
0
24 Oct 2024
Is Smoothness the Key to Robustness? A Comparison of Attention and
  Convolution Models Using a Novel Metric
Is Smoothness the Key to Robustness? A Comparison of Attention and Convolution Models Using a Novel Metric
Baiyuan Chen
MLT
28
0
0
23 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language
  Models
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
34
2
0
22 Oct 2024
Hybrid Generative AI for De Novo Design of Co-Crystals with Enhanced
  Tabletability
Hybrid Generative AI for De Novo Design of Co-Crystals with Enhanced Tabletability
Nina Gubina
Andrei Dmitrenko
Gleb Solovev
Lyubov Yamshchikova
Oleg Petrov
Ivan Lebedev
N. Serov
Grigorii Kirgizov
Nikolay Nikitin
Vladimir Vinogradov
31
0
0
22 Oct 2024
MiniPLM: Knowledge Distillation for Pre-Training Language Models
MiniPLM: Knowledge Distillation for Pre-Training Language Models
Yuxian Gu
Hao Zhou
Fandong Meng
Jie Zhou
Minlie Huang
73
5
0
22 Oct 2024
SeisLM: a Foundation Model for Seismic Waveforms
SeisLM: a Foundation Model for Seismic Waveforms
Tianlin Liu
Jannes Münchmeyer
Laura Laurenti
C. Marone
Maarten V. de Hoop
Ivan Dokmanić
VLM
28
4
0
21 Oct 2024
Generalized Probabilistic Attention Mechanism in Transformers
Generalized Probabilistic Attention Mechanism in Transformers
DongNyeong Heo
Heeyoul Choi
56
0
0
21 Oct 2024
LAC: Graph Contrastive Learning with Learnable Augmentation in
  Continuous Space
LAC: Graph Contrastive Learning with Learnable Augmentation in Continuous Space
Zhenyu Lin
Hongzheng Li
Yingxia Shao
Guanhua Ye
Yawen Li
Quanqing Xu
44
0
0
20 Oct 2024
AERO: Softmax-Only LLMs for Efficient Private Inference
AERO: Softmax-Only LLMs for Efficient Private Inference
N. Jha
Brandon Reagen
32
1
0
16 Oct 2024
SLaNC: Static LayerNorm Calibration
SLaNC: Static LayerNorm Calibration
Mahsa Salmani
Nikita Trukhanov
I. Soloveychik
MQ
33
0
0
14 Oct 2024
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
What Does It Mean to Be a Transformer? Insights from a Theoretical Hessian Analysis
Weronika Ormaniec
Felix Dangel
Sidak Pal Singh
41
7
0
14 Oct 2024
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Lambda-Skip Connections: the architectural component that prevents Rank Collapse
Federico Arangath Joseph
Jerome Sieber
Melanie Zeilinger
Carmen Amo Alonso
35
0
0
14 Oct 2024
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement
  Learning
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Hyunseung Kim
Jun Jet Tai
K. Subramanian
Peter R. Wurman
Jaegul Choo
Peter Stone
Takuma Seno
OffRL
75
7
0
13 Oct 2024
ReLU's Revival: On the Entropic Overload in Normalization-Free Large
  Language Models
ReLU's Revival: On the Entropic Overload in Normalization-Free Large Language Models
N. Jha
Brandon Reagen
OffRL
AI4CE
33
0
0
12 Oct 2024
Generative Model for Less-Resourced Language with 1 billion parameters
Generative Model for Less-Resourced Language with 1 billion parameters
Domen Vreš
Martin Božič
Aljaž Potočnik
Tomaž Martinčič
Marko Robnik-Šikonja
26
1
0
09 Oct 2024
RelitLRM: Generative Relightable Radiance for Large Reconstruction
  Models
RelitLRM: Generative Relightable Radiance for Large Reconstruction Models
Tianyuan Zhang
Zhengfei Kuang
Haian Jin
Zexiang Xu
Sai Bi
...
Yiwei Hu
Miloš Hašan
William T. Freeman
Kai Zhang
Fujun Luan
3DGS
29
2
0
08 Oct 2024
DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning
DimOL: Dimensional Awareness as A New 'Dimension' in Operator Learning
Yichen Song
Yunbo Wang
Xiaokang Yang
Xiaokang Yang
AI4CE
63
0
0
08 Oct 2024
Diffusion Model Predictive Control
Diffusion Model Predictive Control
Guangyao Zhou
Sivaramakrishnan Swaminathan
Rajkumar Vasudeva Raju
J. S. Guntupalli
Wolfgang Lehrach
Joseph Ortiz
Antoine Dedieu
Miguel Lázaro-Gredilla
Kevin P. Murphy
39
6
0
07 Oct 2024
Activation Scaling for Steering and Interpreting Language Models
Activation Scaling for Steering and Interpreting Language Models
Niklas Stoehr
Kevin Du
Vésteinn Snæbjarnarson
Robert West
Ryan Cotterell
Aaron Schein
LLMSV
LRM
39
4
0
07 Oct 2024
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
MOFFlow: Flow Matching for Structure Prediction of Metal-Organic Frameworks
N. Kim
Seongsu Kim
Minsu Kim
Jinkyoo Park
Sungsoo Ahn
AI4CE
41
1
0
07 Oct 2024
Exploring the Benefit of Activation Sparsity in Pre-training
Exploring the Benefit of Activation Sparsity in Pre-training
Zhengyan Zhang
Chaojun Xiao
Qiujieli Qin
Yankai Lin
Zhiyuan Zeng
Xu Han
Zhiyuan Liu
Ruobing Xie
Maosong Sun
Jie Zhou
MoE
64
3
0
04 Oct 2024
Error Correction Code Transformer: From Non-Unified to Unified
Error Correction Code Transformer: From Non-Unified to Unified
Yongli Yan
Jieao Zhu
Tianyue Zheng
Jiaqi He
Linglong Dai
29
1
0
04 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
51
9
0
03 Oct 2024
MOREL: Enhancing Adversarial Robustness through Multi-Objective
  Representation Learning
MOREL: Enhancing Adversarial Robustness through Multi-Objective Representation Learning
Sedjro Salomon Hotegni
Sebastian Peitz
AAML
26
2
0
02 Oct 2024
Foldable SuperNets: Scalable Merging of Transformers with Different
  Initializations and Tasks
Foldable SuperNets: Scalable Merging of Transformers with Different Initializations and Tasks
Edan Kinderman
Itay Hubara
Haggai Maron
Daniel Soudry
MoMe
52
1
0
02 Oct 2024
EuroLLM: Multilingual Language Models for Europe
EuroLLM: Multilingual Language Models for Europe
Pedro Henrique Martins
Patrick Fernandes
Joao Alves
Nuno M. Guerreiro
Ricardo Rei
...
Pierre Colombo
Barry Haddow
José G. C. de Souza
Alexandra Birch
André F. T. Martins
37
20
0
24 Sep 2024
Micrometer: Micromechanics Transformer for Predicting Mechanical
  Responses of Heterogeneous Materials
Micrometer: Micromechanics Transformer for Predicting Mechanical Responses of Heterogeneous Materials
Sizhuang He
Tong-Rui Liu
Shyam Sankaran
P. Perdikaris
AI4CE
47
3
0
23 Sep 2024
Previous
12345...101112
Next