Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.06450
Cited By
Layer Normalization
21 July 2016
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Layer Normalization"
50 / 5,502 papers shown
Title
UnityGraph: Unified Learning of Spatio-temporal features for Multi-person Motion Prediction
Kehua Qu
Rui Ding
Jin Tang
3DH
36
0
0
06 Nov 2024
Relation Learning and Aggregate-attention for Multi-person Motion Prediction
Kehua Qu
Rui Ding
Jin Tang
3DH
42
0
0
06 Nov 2024
Estimating Ego-Body Pose from Doubly Sparse Egocentric Video Data
Seunggeun Chi
Pin-Hao Huang
Enna Sachdeva
Hengbo Ma
Karthik Ramani
Kwonjoon Lee
DiffM
50
2
0
05 Nov 2024
LASER: Attention with Exponential Transformation
Sai Surya Duvvuri
Inderjit Dhillon
43
1
0
05 Nov 2024
Discovering Data Structures: Nearest Neighbor Search and Beyond
Omar Salemohamed
Laurent Charlin
Shivam Garg
Vatsal Sharan
Gregory Valiant
FedML
21
0
0
05 Nov 2024
Blind Estimation of Sub-band Acoustic Parameters from Ambisonics Recordings using Spectro-Spatial Covariance Features
Hanyu Meng
Jeroen Breebaart
Jeremy Stoddard
V. Sethu
Eliathamby Ambikairajah
34
0
0
05 Nov 2024
Speech Separation with Pretrained Frontend to Minimize Domain Mismatch
Wupeng Wang
Zexu Pan
Xianrui Li
Shuai Wang
Yiming Li
34
4
0
05 Nov 2024
Predictor-Corrector Enhanced Transformers with Exponential Moving Average Coefficient Learning
Yangqiu Song
Tong Zheng
Ran Wang
Jiahao Liu
Qingyan Guo
...
Xu Tan
Tong Xiao
Jingbo Zhu
Jie Wang
Xunliang Cai
60
1
0
05 Nov 2024
Rethinking Decoders for Transformer-based Semantic Segmentation: A Compression Perspective
Qishuai Wen
Chun-Guang Li
ViT
37
0
0
05 Nov 2024
Geometry of naturalistic object representations in recurrent neural network models of working memory
Xiaoxuan Lei
Takuya Ito
P. Bashivan
33
0
0
04 Nov 2024
DeeR-VLA: Dynamic Inference of Multimodal Large Language Models for Efficient Robot Execution
Yang Yue
Yulin Wang
Bingyi Kang
Yizeng Han
Shenzhi Wang
Shiji Song
Jiashi Feng
Gao Huang
VLM
45
16
0
04 Nov 2024
Training Compute-Optimal Protein Language Models
Xingyi Cheng
Bo Chen
Pan Li
Jing Gong
Jie Tang
Le Song
84
13
0
04 Nov 2024
Scalable Efficient Training of Large Language Models with Low-dimensional Projected Attention
Xingtai Lv
Ning Ding
Kaiyan Zhang
Ermo Hua
Ganqu Cui
Bowen Zhou
42
1
0
04 Nov 2024
MSTA3D: Multi-scale Twin-attention for 3D Instance Segmentation
Duc Dang Trung Tran
Byeongkeun Kang
Yeejin Lee
3DV
31
1
0
04 Nov 2024
ParseCaps: An Interpretable Parsing Capsule Network for Medical Image Diagnosis
Xinyu Geng
Jiaming Wang
Jun Xu
MedIm
34
0
0
03 Nov 2024
Data movement limits to frontier model training
Ege Erdil
David Schneider-Joseph
41
1
0
02 Nov 2024
Normalization Layer Per-Example Gradients are Sufficient to Predict Gradient Noise Scale in Transformers
Gavia Gray
Aman Tiwari
Shane Bergsma
Joel Hestness
30
1
0
01 Nov 2024
Wasserstein Flow Matching: Generative modeling over families of distributions
Doron Haviv
Aram-Alexandre Pooladian
D. Pe’er
Brandon Amos
OOD
42
0
0
01 Nov 2024
SPOT: SE(3) Pose Trajectory Diffusion for Object-Centric Manipulation
Cheng-Chun Hsu
Bowen Wen
Jie Xu
Yashraj S. Narang
Xiaolong Wang
Yuke Zhu
Joydeep Biswas
Stan Birchfield
DiffM
43
8
0
01 Nov 2024
Reinforcement Learning Gradients as Vitamin for Online Finetuning Decision Transformers
Kai Yan
A. Schwing
Yu-xiong Wang
OffRL
OnRL
36
0
0
31 Oct 2024
TrAct: Making First-layer Pre-Activations Trainable
Felix Petersen
Christian Borgelt
Stefano Ermon
24
0
0
31 Oct 2024
Analyzing & Reducing the Need for Learning Rate Warmup in GPT Training
Atli Kosson
Bettina Messmer
Martin Jaggi
AI4CE
22
3
0
31 Oct 2024
Aggregate-and-Adapt Natural Language Prompts for Downstream Generalization of CLIP
Chen Huang
Skyler Seto
Samira Abnar
David Grangier
Navdeep Jaitly
J. Susskind
VLM
51
0
0
31 Oct 2024
ResiDual Transformer Alignment with Spectral Decomposition
Lorenzo Basile
Valentino Maiorca
Luca Bortolussi
Emanuele Rodolà
Francesco Locatello
48
1
0
31 Oct 2024
DASH: Warm-Starting Neural Network Training in Stationary Settings without Loss of Plasticity
Baekrok Shin
Junsoo Oh
Hanseul Cho
Chulhee Yun
AI4CE
52
1
0
30 Oct 2024
(FL)
2
^2
2
: Overcoming Few Labels in Federated Semi-Supervised Learning
Seungjoo Lee
Thanh-Long V. Le
Jaemin Shin
Sung-Ju Lee
FedML
42
1
0
30 Oct 2024
DiaMond: Dementia Diagnosis with Multi-Modal Vision Transformers Using MRI and PET
Yitong Li
Morteza Ghahremani
Youssef Wally
Christian Wachinger
MedIm
38
0
0
30 Oct 2024
TokenFormer: Rethinking Transformer Scaling with Tokenized Model Parameters
Haiyang Wang
Yue Fan
Muhammad Ferjad Naeem
Yongqin Xian
J. E. Lenssen
Liwei Wang
F. Tombari
Bernt Schiele
49
2
0
30 Oct 2024
Kinetix: Investigating the Training of General Agents through Open-Ended Physics-Based Control Tasks
Michael T. Matthews
Michael Beukman
Chris Xiaoxuan Lu
Jakob Foerster
OffRL
AI4CE
36
3
0
30 Oct 2024
Scalable Message Passing Neural Networks: No Need for Attention in Large Graph Representation Learning
Haitz Sáez de Ocáriz Borde
Artem Lukoianov
Anastasis Kratsios
Michael M. Bronstein
Xiaowen Dong
GNN
43
1
0
29 Oct 2024
ET-Flow: Equivariant Flow-Matching for Molecular Conformer Generation
Majdi Hassan
Nikhil Shenoy
Jungyoon Lee
Hannes Stärk
Stephan Thaler
Dominique Beaini
44
6
0
29 Oct 2024
Where Do Large Learning Rates Lead Us?
Ildus Sadrtdinov
M. Kodryan
Eduard Pokonechny
E. Lobacheva
Dmitry Vetrov
AI4CE
34
0
0
29 Oct 2024
HRPVT: High-Resolution Pyramid Vision Transformer for medium and small-scale human pose estimation
Zhoujie Xu
ViT
3DH
38
2
0
29 Oct 2024
Efficient Machine Translation with a BiLSTM-Attention Approach
Yuxu Wu
Yiren Xing
22
0
0
29 Oct 2024
Dual Conditional Diffusion Models for Sequential Recommendation
Hongtao Huang
Chengkai Huang
Xiaojun Chang
Wen Hu
Lina Yao
Julian McAuley
Lina Yao
DiffM
50
2
0
29 Oct 2024
Mitigating Gradient Overlap in Deep Residual Networks with Gradient Normalization for Improved Non-Convex Optimization
Juyoung Yun
35
2
0
28 Oct 2024
Scaling-based Data Augmentation for Generative Models and its Theoretical Extension
Yoshitaka Koike
Takumi Nakagawa
Hiroki Waida
Takafumi Kanamori
DiffM
30
0
0
28 Oct 2024
Plastic Learning with Deep Fourier Features
Alex Lewandowski
Dale Schuurmans
Marlos C. Machado
CLL
47
3
0
27 Oct 2024
Emergence of Globally Attracting Fixed Points in Deep Neural Networks With Nonlinear Activations
Amir Joudaki
Thomas Hofmann
MLT
23
0
0
26 Oct 2024
Sensor2Text: Enabling Natural Language Interactions for Daily Activity Tracking Using Wearable Sensors
Wenqiang Chen
Jiaxuan Cheng
Leyao Wang
Wei Zhao
Wojciech Matusik
33
1
0
26 Oct 2024
Self-Normalized Resets for Plasticity in Continual Learning
Vivek F. Farias
Adam D. Jozefiak
CLL
48
1
0
26 Oct 2024
OGBench: Benchmarking Offline Goal-Conditioned RL
Seohong Park
Kevin Frans
Benjamin Eysenbach
Sergey Levine
OffRL
62
10
0
26 Oct 2024
COAT: Compressing Optimizer states and Activation for Memory-Efficient FP8 Training
Haocheng Xi
Han Cai
Ligeng Zhu
Yaojie Lu
Kurt Keutzer
Jianfei Chen
Song Han
MQ
75
9
0
25 Oct 2024
Free-Rider and Conflict Aware Collaboration Formation for Cross-Silo Federated Learning
Mengmeng Chen
Xiaohu Wu
Xiaoli Tang
Tiantian He
Yew-Soon Ong
Qiqi Liu
Qicheng Lao
Han Yu
FedML
29
3
0
25 Oct 2024
Citywide Electric Vehicle Charging Demand Prediction Approach Considering Urban Region and Dynamic Influences
Haoxuan Kuang
Kunxiang Deng
Linlin You
Jun Li
30
0
0
24 Oct 2024
On Explaining with Attention Matrices
Omar Naim
Nicholas Asher
34
1
0
24 Oct 2024
Scale Propagation Network for Generalizable Depth Completion
Haotian Wang
Meng Yang
Xinhu Zheng
Gang Hua
31
2
0
24 Oct 2024
Rethinking Positive Pairs in Contrastive Learning
Jiantao Wu
Shentong Mo
Zhenhua Feng
Sara Atito
Josef Kitler
Muhammad Awais
SSL
VLM
56
3
0
23 Oct 2024
From Attention to Activation: Unravelling the Enigmas of Large Language Models
Prannay Kaul
Chengcheng Ma
Ismail Elezi
Jiankang Deng
34
2
0
22 Oct 2024
PLDR-LLM: Large Language Model from Power Law Decoder Representations
Burc Gokden
26
1
0
22 Oct 2024
Previous
1
2
3
...
6
7
8
...
109
110
111
Next