Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.06450
Cited By
Layer Normalization
21 July 2016
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Layer Normalization"
50 / 5,502 papers shown
Title
Deep Reinforcement Learning based Triggering Function for Early Classifiers of Time Series
Aurélien Renault
A. Bondu
Antoine Cornuéjols
Vincent Lemaire
49
0
0
10 Feb 2025
PiKE: Adaptive Data Mixing for Multi-Task Learning Under Low Gradient Conflicts
Zeman Li
Yuan Deng
Peilin Zhong
Meisam Razaviyayn
Vahab Mirrokni
MoMe
75
1
0
10 Feb 2025
Fully Exploiting Vision Foundation Model's Profound Prior Knowledge for Generalizable RGB-Depth Driving Scene Parsing
Sicen Guo
Tianyou Wen
Chuang-Wei Liu
Qijun Chen
Rui Fan
57
0
0
10 Feb 2025
The Curse of Depth in Large Language Models
Wenfang Sun
Xinyuan Song
Pengxiang Li
Lu Yin
Yefeng Zheng
Shiwei Liu
75
4
0
09 Feb 2025
Hypencoder: Hypernetworks for Information Retrieval
Julian Killingback
Hansi Zeng
Hamed Zamani
112
1
0
07 Feb 2025
RAPID: Robust and Agile Planner Using Inverse Reinforcement Learning for Vision-Based Drone Navigation
Minwoo Kim
Geunsik Bae
Jinwoo Lee
Woojae Shin
Changseung Kim
Myong-Yol Choi
Heejung Shin
H. Oh
81
0
0
04 Feb 2025
Mass-Editing Memory with Attention in Transformers: A cross-lingual exploration of knowledge
Daniel Tamayo
Aitor Gonzalez-Agirre
Javier Hernando
Marta Villegas
KELM
93
3
0
04 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers
Alireza Amiri
Xinting Huang
Mark Rofin
Michael Hahn
LRM
216
0
0
04 Feb 2025
M2R2: Mixture of Multi-Rate Residuals for Efficient Transformer Inference
Nikhil Bhendawade
Mahyar Najibi
Devang Naik
Irina Belousova
MoE
85
0
0
04 Feb 2025
Particle Trajectory Representation Learning with Masked Point Modeling
Sam Young
Yeon-jae Jwa
Kazuhiro Terao
3DPC
69
1
0
04 Feb 2025
Transformers trained on proteins can learn to attend to Euclidean distance
Isaac Ellmen
Constantin Schneider
Matthew I.J. Raybould
Charlotte M. Deane
79
0
0
03 Feb 2025
Activation by Interval-wise Dropout: A Simple Way to Prevent Neural Networks from Plasticity Loss
Sangyeon Park
Isaac Han
Seungwon Oh
Kyung-Joong Kim
62
2
0
03 Feb 2025
Beyond the Permutation Symmetry of Transformers: The Role of Rotation for Model Fusion
Binchi Zhang
Zaiyi Zheng
Zhengzhang Chen
Wenlin Yao
72
0
0
01 Feb 2025
Learning from Suboptimal Data in Continuous Control via Auto-Regressive Soft Q-Network
Jijia Liu
Feng Gao
Q. Liao
Chao Yu
Yu-Xiang Wang
OffRL
76
0
0
01 Feb 2025
Optimizing Job Allocation using Reinforcement Learning with Graph Neural Networks
Lars C.P.M. Quaedvlieg
63
0
0
31 Jan 2025
State-space models are accurate and efficient neural operators for dynamical systems
Zheyuan Hu
Nazanin Ahmadi Daryakenari
Qianli Shen
Kenji Kawaguchi
George Karniadakis
Mamba
AI4CE
75
13
0
28 Jan 2025
Towards General-Purpose Model-Free Reinforcement Learning
Scott Fujimoto
P. DÓro
Amy Zhang
Yuandong Tian
Michael Rabbat
OffRL
41
3
0
28 Jan 2025
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference
Go Kamoda
Benjamin Heinzerling
Tatsuro Inaba
Keito Kudo
Keisuke Sakaguchi
Kentaro Inui
MILM
36
1
0
27 Jan 2025
iFormer: Integrating ConvNet and Transformer for Mobile Application
Chuanyang Zheng
ViT
74
0
0
26 Jan 2025
Semi-supervised Anomaly Detection with Extremely Limited Labels in Dynamic Graphs
Jiazhen Chen
Sichao Fu
Zheng Ma
M. Feng
T. Wirjanto
Qinmu Peng
43
0
0
25 Jan 2025
You Only Prune Once: Designing Calibration-Free Model Compression With Policy Learning
Ayan Sengupta
Siddhant Chaudhary
Tanmoy Chakraborty
44
3
0
25 Jan 2025
A Transformer-based Autoregressive Decoder Architecture for Hierarchical Text Classification
Younes Yousef
Lukas Galke
A. Scherp
49
0
0
23 Jan 2025
GLAM: Global-Local Variation Awareness in Mamba-based World Model
Qian He
Wenqi Liang
Chunhui Hao
Gan Sun
Jiandong Tian
63
0
0
21 Jan 2025
Multi-modal Fusion and Query Refinement Network for Video Moment Retrieval and Highlight Detection
Yifang Xu
Yunzhuo Sun
Benxiang Zhai
Zien Xie
Youyao Jia
S. Du
49
2
0
18 Jan 2025
A Non-autoregressive Model for Joint STT and TTS
Vishal Sunder
Brian Kingsbury
G. Saon
Samuel Thomas
Slava Shechtman Hagai Aronowitz
Hagai Aronowitz
Eric Fosler-Lussier
Luis Lastras
66
0
0
15 Jan 2025
EmoNeXt: an Adapted ConvNeXt for Facial Emotion Recognition
Yassine El Boudouri
Amine Bohi
75
15
0
14 Jan 2025
Democratizing Text-to-Image Masked Generative Models with Compact Text-Aware One-Dimensional Tokens
Dongwon Kim
Ju He
Qihang Yu
Chenglin Yang
Xiaohui Shen
Suha Kwak
Liang-Chieh Chen
VLM
54
6
0
13 Jan 2025
Better Prompt Compression Without Multi-Layer Perceptrons
Edouardo Honig
Andrew Lizarraga
Zijun Zhang
Ying Nian Wu
MQ
190
1
0
12 Jan 2025
TTS-Transducer: End-to-End Speech Synthesis with Neural Transducer
Vladimir Bataev
Subhankar Ghosh
Vitaly Lavrukhin
Jason Chun Lok Li
AI4TS
46
0
0
10 Jan 2025
Time Transfer: On Optimal Learning Rate and Batch Size In The Infinite Data Limit
Oleg Filatov
Jan Ebert
Jiangtao Wang
Stefan Kesselheim
44
3
0
10 Jan 2025
CGP-Tuning: Structure-Aware Soft Prompt Tuning for Code Vulnerability Detection
Ruijun Feng
Hammond Pearce
Pietro Liguori
Yulei Sui
35
0
0
08 Jan 2025
Flemme: A Flexible and Modular Learning Platform for Medical Images
Guoqing Zhang
Jingyun Yang
Yang Li
MedIm
53
1
0
08 Jan 2025
Hierarchical Light Transformer Ensembles for Multimodal Trajectory Forecasting
Adrien Lafage
Mathieu Barbier
Gianni Franchi
David Filliat
45
3
0
08 Jan 2025
Localize-and-Stitch: Efficient Model Merging via Sparse Task Arithmetic
Yifei He
Yuzheng Hu
Yong Lin
Tong Zhang
Han Zhao
FedML
MoMe
65
19
0
08 Jan 2025
LWFNet: Coherent Doppler Wind Lidar-Based Network for Wind Field Retrieval
R. Tao
Chong Wang
Hao Chen
Mingjiao Jia
Xiang Shang
...
Yanyu Lu
Yanfeng Huo
Junlin Wu
Xianghui Xue
Xiankang Dou
38
0
0
05 Jan 2025
Efficient Architectures for High Resolution Vision-Language Models
Miguel Carvalho
Bruno Martins
MLLM
VLM
45
0
0
05 Jan 2025
SpecDETR: A Transformer-based Hyperspectral Point Object Detection Network
Zhaoxu Li
Wei An
Gaowei Guo
Longguang Wang
Yingqian Wang
Zaiping Lin
ViT
88
0
0
03 Jan 2025
VisionLLM v2: An End-to-End Generalist Multimodal Large Language Model for Hundreds of Vision-Language Tasks
Jiannan Wu
Muyan Zhong
Sen Xing
Zeqiang Lai
Zhaoyang Liu
...
Lewei Lu
Tong Lu
Ping Luo
Yu Qiao
Jifeng Dai
MLLM
VLM
LRM
102
48
0
03 Jan 2025
Keypoint Aware Masked Image Modelling
Madhava Krishna
Convin.AI
73
0
0
03 Jan 2025
Hadamard Attention Recurrent Transformer: A Strong Baseline for Stereo Matching Transformer
Ziyang Chen
Yongjun Zhang
Wenting Li
Bingshu Wang
Yabo Wu
Yong Zhao
C. L. P. Chen
49
0
0
02 Jan 2025
Brain-to-Text Benchmark '24: Lessons Learned
Francis R. Willett
Jingyuan Li
Trung Le
Chaofei Fan
Mingfei Chen
...
Maxwell Kounga
E. Kelly Buchanan
D. Zoltowski
Scott W. Linderman
Jaimie M. Henderson
28
0
0
23 Dec 2024
Adam on Local Time: Addressing Nonstationarity in RL with Relative Adam Timesteps
Benjamin Ellis
Matthew Jackson
Andrei Lupu
Alexander David Goldie
Mattie Fellows
Shimon Whiteson
Jakob Foerster
87
0
0
22 Dec 2024
MultiverSeg: Scalable Interactive Segmentation of Biomedical Imaging Datasets with In-Context Guidance
Hallee E. Wong
Jose Javier Gonzalez Ortiz
John Guttag
Adrian V. Dalca
93
0
0
19 Dec 2024
Mix-LN: Unleashing the Power of Deeper Layers by Combining Pre-LN and Post-LN
Pengxiang Li
Lu Yin
Shiwei Liu
75
4
0
18 Dec 2024
Smarter, Better, Faster, Longer: A Modern Bidirectional Encoder for Fast, Memory Efficient, and Long Context Finetuning and Inference
Benjamin Warner
Antoine Chaffin
Benjamin Clavié
Orion Weller
Oskar Hallström
...
Tom Aarsen
Nathan Cooper
Griffin Adams
Jeremy Howard
Iacopo Poli
93
79
0
18 Dec 2024
QueryCDR: Query-Based Controllable Distortion Rectification Network for Fisheye Images
Pengbo Guo
Chengxu Liu
Xingsong Hou
Xueming Qian
68
3
0
18 Dec 2024
Concept-ROT: Poisoning Concepts in Large Language Models with Model Editing
Keltin Grimes
Marco Christiani
David Shriver
Marissa Connor
KELM
85
1
0
17 Dec 2024
Design of Restricted Normalizing Flow towards Arbitrary Stochastic Policy with Computational Efficiency
Taisuke Kobayashi
Takumi Aotani
139
5
0
17 Dec 2024
AsymRnR: Video Diffusion Transformers Acceleration with Asymmetric Reduction and Restoration
Wenhao Sun
Rong-Cheng Tu
Jingyi Liao
Zhao Jin
Dacheng Tao
VGen
111
1
0
16 Dec 2024
Stabilizing Reinforcement Learning in Differentiable Multiphysics Simulation
Eliot Xing
Vernon Luk
Jean Oh
84
0
0
16 Dec 2024
Previous
1
2
3
4
5
6
...
109
110
111
Next