ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1910.07467
  4. Cited By
Root Mean Square Layer Normalization

Root Mean Square Layer Normalization

16 October 2019
Biao Zhang
Rico Sennrich
ArXivPDFHTML

Papers citing "Root Mean Square Layer Normalization"

50 / 97 papers shown
Title
PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints
PCDCNet: A Surrogate Model for Air Quality Forecasting with Physical-Chemical Dynamics and Constraints
Shuo Wang
Yun Cheng
Qingye Meng
O. Saukh
Jiang Zhang
Jingfang Fan
Yuanting Zhang
Xingyuan Yuan
Lothar Thiele
AI4CE
32
0
0
26 May 2025
FP4 All the Way: Fully Quantized Training of LLMs
FP4 All the Way: Fully Quantized Training of LLMs
Brian Chmiel
Maxim Fishman
Ron Banner
Daniel Soudry
MQ
34
0
0
25 May 2025
Exact Expressive Power of Transformers with Padding
Exact Expressive Power of Transformers with Padding
William Merrill
Ashish Sabharwal
19
0
0
25 May 2025
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
LLaDA 1.5: Variance-Reduced Preference Optimization for Large Language Diffusion Models
Fengqi Zhu
Rongzhen Wang
Shen Nie
Xiaolu Zhang
Chunwei Wu
...
Jun Zhou
Jianfei Chen
Yankai Lin
Ji-Rong Wen
Chongxuan Li
106
0
0
25 May 2025
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
PLUMAGE: Probabilistic Low rank Unbiased Min Variance Gradient Estimator for Efficient Large Model Training
Matan Haroush
Daniel Soudry
39
0
0
23 May 2025
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Task Specific Pruning with LLM-Sieve: How Many Parameters Does Your Task Really Need?
Waleed Reda
Abhinav Jangda
Krishna Chintalapudi
26
0
0
23 May 2025
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
ECHO-LLaMA: Efficient Caching for High-Performance LLaMA Training
Maryam Dialameh
Rezaul Karim
Hossein Rajabzadeh
Omar Mohamed Awad
Hyock Ju Kwon
Boxing Chen
Walid Ahmed
Yang Liu
24
0
0
22 May 2025
Do Language Models Use Their Depth Efficiently?
Do Language Models Use Their Depth Efficiently?
Róbert Csordás
Christopher D. Manning
Christopher Potts
53
0
0
20 May 2025
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
TCSinger 2: Customizable Multilingual Zero-shot Singing Voice Synthesis
Yu Zhang
Wenxiang Guo
Changhao Pan
Dongyu Yao
Zhiyuan Zhu
Ziyue Jiang
Yuhan Wang
Tao Jin
Zhou Zhao
VLM
41
0
0
20 May 2025
Panda: A pretrained forecast model for universal representation of chaotic dynamics
Panda: A pretrained forecast model for universal representation of chaotic dynamics
Jeffrey Lai
Anthony Bao
William Gilpin
AI4TS
AI4CE
36
0
0
19 May 2025
Chain-of-Model Learning for Language Model
Chain-of-Model Learning for Language Model
Kaitao Song
Xiaohua Wang
Xu Tan
Huiqiang Jiang
Chengruidong Zhang
...
Xiaoqing Zheng
Tao Qin
Yuqing Yang
Dongsheng Li
Lili Qiu
LRM
AI4CE
70
0
0
17 May 2025
Versatile Framework for Song Generation with Prompt-based Control
Versatile Framework for Song Generation with Prompt-based Control
Yanzhe Zhang
Wenxiang Guo
Changhao Pan
Zehan Zhu
Ruiqi Li
...
Rongjie Huang
Ruiyuan Zhang
Zhiqing Hong
Ziyue Jiang
Zhou Zhao
100
2
0
27 Apr 2025
RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation
RETROcode: Leveraging a Code Database for Improved Natural Language to Code Generation
Nathanael Beau
Benoît Crabbé
53
0
0
08 Apr 2025
Adaptive Layer-skipping in Pre-trained LLMs
Adaptive Layer-skipping in Pre-trained LLMs
Xuan Luo
Weizhi Wang
Xifeng Yan
333
1
0
31 Mar 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
63
0
0
29 Mar 2025
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
Tiled Flash Linear Attention: More Efficient Linear RNN and xLSTM Kernels
M. Beck
Korbinian Poppel
Phillip Lippe
Sepp Hochreiter
91
1
0
18 Mar 2025
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
SANA-Sprint: One-Step Diffusion with Continuous-Time Consistency Distillation
Junsong Chen
Shuchen Xue
Yuyang Zhao
Jincheng Yu
Sayak Paul
Junyu Chen
Han Cai
Enze Xie
Enze Xie
VLM
78
4
0
12 Mar 2025
EuroBERT: Scaling Multilingual Encoders for European Languages
EuroBERT: Scaling Multilingual Encoders for European Languages
Nicolas Boizard
Hippolyte Gisserot-Boukhlef
Duarte M. Alves
André F. T. Martins
Ayoub Hammal
...
Maxime Peyrard
Nuno M. Guerreiro
Patrick Fernandes
Ricardo Rei
Pierre Colombo
334
3
0
07 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
60
0
0
06 Mar 2025
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Not-Just-Scaling Laws: Towards a Better Understanding of the Downstream Impact of Language Model Design Decisions
Emmy Liu
Amanda Bertsch
Lintang Sutawika
Lindia Tjuatja
Patrick Fernandes
...
Siyang Song
Carolin (Haas) Lawrence
Aditi Raghunathan
Kiril Gashteovski
Graham Neubig
129
0
0
05 Mar 2025
Is Pre-training Applicable to the Decoder for Dense Prediction?
Is Pre-training Applicable to the Decoder for Dense Prediction?
Chao Ning
Wanshui Gan
Weihao Xuan
Naoto Yokoya
102
0
0
05 Mar 2025
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Stable-SPAM: How to Train in 4-Bit More Stably than 16-Bit Adam
Tianjin Huang
Haotian Hu
Zhenyu Zhang
Gaojie Jin
Xianrui Li
...
Tianlong Chen
Lu Liu
Qingsong Wen
Zhangyang Wang
Shiwei Liu
MQ
54
1
0
24 Feb 2025
Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation
Exploring Mutual Cross-Modal Attention for Context-Aware Human Affordance Generation
Prasun Roy
Saumik Bhattacharya
Subhankar Ghosh
Umapada Pal
Michael Blumenstein
78
0
0
20 Feb 2025
Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability
Uncertainty Representations in State-Space Layers for Deep Reinforcement Learning under Partial Observability
Carlos E. Luis
A. Bottero
Julia Vinogradska
Felix Berkenkamp
Jan Peters
115
1
0
20 Feb 2025
Large Language Diffusion Models
Large Language Diffusion Models
Shen Nie
Fengqi Zhu
Zebin You
Xiaolu Zhang
Jingyang Ou
Jun Hu
Jun Zhou
Yankai Lin
Ji-Rong Wen
Chongxuan Li
135
38
0
14 Feb 2025
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Steel-LLM:From Scratch to Open Source -- A Personal Journey in Building a Chinese-Centric LLM
Qingshui Gu
Shu Li
Tianyu Zheng
Zhaoxiang Zhang
360
0
0
10 Feb 2025
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
SPAM: Spike-Aware Adam with Momentum Reset for Stable LLM Training
Tianjin Huang
Ziquan Zhu
Gaojie Jin
Lu Liu
Zhangyang Wang
Shiwei Liu
63
3
0
12 Jan 2025
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
LinGen: Towards High-Resolution Minute-Length Text-to-Video Generation with Linear Computational Complexity
Hongjie Wang
Chih-Yao Ma
Yen-Cheng Liu
Ji Hou
Tao Xu
...
Peizhao Zhang
Tingbo Hou
Peter Vajda
N. Jha
Xiaoliang Dai
LMTD
VGen
VLM
DiffM
116
6
0
13 Dec 2024
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Switti: Designing Scale-Wise Transformers for Text-to-Image Synthesis
Anton Voronov
Denis Kuznedelev
Mikhail Khoroshikh
Valentin Khrulkov
Dmitry Baranchuk
134
2
0
02 Dec 2024
More Expressive Attention with Negative Weights
More Expressive Attention with Negative Weights
Ang Lv
Ruobing Xie
Shuaipeng Li
Jiayi Liao
Xingwu Sun
Zhanhui Kang
Di Wang
Rui Yan
57
1
0
11 Nov 2024
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
On Improved Conditioning Mechanisms and Pre-training Strategies for Diffusion Models
Tariq Berrada Ifriqi
Pietro Astolfi
Melissa Hall
Reyhane Askari Hemmat
Yohann Benchetrit
...
Matthew Muckley
Karteek Alahari
Adriana Romero Soriano
Jakob Verbeek
M. Drozdzal
AI4CE
VLM
88
2
0
05 Nov 2024
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang
Yue Fan
Muhammad Ferjad Naeem
Yongqin Xian
J. E. Lenssen
Liwei Wang
F. Tombari
Bernt Schiele
58
2
0
30 Oct 2024
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
MrT5: Dynamic Token Merging for Efficient Byte-level Language Models
Julie Kallini
Shikhar Murty
Christopher D. Manning
Christopher Potts
Róbert Csordás
44
3
0
28 Oct 2024
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Relaxed Recursive Transformers: Effective Parameter Sharing with Layer-wise LoRA
Sangmin Bae
Adam Fisch
Hrayr Harutyunyan
Ziwei Ji
Seungyeon Kim
Tal Schuster
KELM
89
6
0
28 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
84
9
0
25 Oct 2024
Scaling up Masked Diffusion Models on Text
Scaling up Masked Diffusion Models on Text
Shen Nie
Fengqi Zhu
Chao Du
Tianyu Pang
Qian Liu
Guangtao Zeng
Min Lin
Chongxuan Li
AI4CE
87
24
0
24 Oct 2024
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
SF-Speech: Straightened Flow for Zero-Shot Voice Clone
Xuyuan Li
Zengqiang Shang
Hua Hua
Peiyang Shi
Chen Yang
Li Wang
Pengyuan Zhang
88
2
0
16 Oct 2024
Transfer Learning with Foundational Models for Time Series Forecasting using Low-Rank Adaptations
Transfer Learning with Foundational Models for Time Series Forecasting using Low-Rank Adaptations
M. Germán-Morales
A. J. Rivera-Rivas
M. J. del Jesus Díaz
C. J. Carmona
AI4TS
AI4CE
123
0
0
15 Oct 2024
Liger Kernel: Efficient Triton Kernels for LLM Training
Liger Kernel: Efficient Triton Kernels for LLM Training
Pin-Lun Hsu
Yun Dai
Vignesh Kothapalli
Qingquan Song
Shao Tang
Siyu Zhu
Steven Shimizu
Shivam Sahni
Haowen Ning
Yanning Chen
61
38
0
14 Oct 2024
ControlMM: Controllable Masked Motion Generation
ControlMM: Controllable Masked Motion Generation
Ekkasit Pinyoanuntapong
Muhammad Usama Saleem
Korrawe Karunratanakul
Pu Wang
Hongfei Xue
Chong Chen
Chuan Guo
Junli Cao
J. Ren
Sergey Tulyakov
VGen
46
24
0
14 Oct 2024
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
SimBa: Simplicity Bias for Scaling Up Parameters in Deep Reinforcement Learning
Hojoon Lee
Dongyoon Hwang
Donghu Kim
Hyunseung Kim
Jun Jet Tai
K. Subramanian
Peter R. Wurman
Jaegul Choo
Peter Stone
Takuma Seno
OffRL
98
12
0
13 Oct 2024
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
DARE the Extreme: Revisiting Delta-Parameter Pruning For Fine-Tuned Models
Wenlong Deng
Yize Zhao
V. Vakilian
Minghui Chen
Xiaoxiao Li
Christos Thrampoulidis
93
6
0
12 Oct 2024
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Mono-InternVL: Pushing the Boundaries of Monolithic Multimodal Large Language Models with Endogenous Visual Pre-training
Gen Luo
Xue Yang
Wenhan Dou
Zhaokai Wang
Jifeng Dai
Jifeng Dai
Yu Qiao
Xizhou Zhu
VLM
MLLM
82
26
0
10 Oct 2024
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu
Lingxuan Wu
Bangguo Li
Hengkai Tan
Huayu Chen
Zhengyi Wang
Ke Xu
Hang Su
Jun Zhu
58
102
0
10 Oct 2024
Differential Transformer
Differential Transformer
Tianzhu Ye
Li Dong
Yuqing Xia
Yutao Sun
Yi Zhu
Gao Huang
Furu Wei
325
0
0
07 Oct 2024
Selective Attention Improves Transformer
Selective Attention Improves Transformer
Yaniv Leviathan
Matan Kalman
Yossi Matias
74
10
0
03 Oct 2024
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
On Expressive Power of Looped Transformers: Theoretical Analysis and Enhancement via Timestep Encoding
Kevin Xu
Issei Sato
65
4
0
02 Oct 2024
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Characterizing and Efficiently Accelerating Multimodal Generation Model Inference
Yejin Lee
Anna Y. Sun
Basil Hosmer
Bilge Acun
Can Balioglu
...
Ram Pasunuru
Scott Yih
Sravya Popuri
Xing Liu
Carole-Jean Wu
87
2
0
30 Sep 2024
LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World
LiRA: Light-Robust Adversary for Model-based Reinforcement Learning in Real World
Taisuke Kobayashi
86
2
0
29 Sep 2024
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Time-MoE: Billion-Scale Time Series Foundation Models with Mixture of Experts
Xiaoming Shi
Shiyu Wang
Yuqi Nie
Dianqi Li
Zhou Ye
Qingsong Wen
Ming Jin
AI4TS
59
37
0
24 Sep 2024
12
Next