Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2010.04245
Cited By
Query-Key Normalization for Transformers
8 October 2020
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Query-Key Normalization for Transformers"
37 / 37 papers shown
Title
Absolute Coordinates Make Motion Generation Easy
Zichong Meng
Zeyu Han
Xiaogang Peng
Yiming Xie
Huaizu Jiang
161
0
0
26 May 2025
Fast Text-to-Audio Generation with Adversarial Post-Training
Cheng-i Wang
Zach Evans
Zack Zukowski
Josiah Taylor
CJ Carr
...
Adnan Al-Sinan
Gian Marco Iodice
Julian McAuley
Taylor Berg-Kirkpatrick
Jordi Pons
86
0
0
13 May 2025
MotionStreamer: Streaming Motion Generation via Diffusion-based Autoregressive Model in Causal Latent Space
Lixing Xiao
Shunlin Lu
Huaijin Pi
Ke Fan
Liang Pan
Yueer Zhou
Ziyong Feng
Xiaowei Zhou
Sida Peng
Jingbo Wang
DiffM
VGen
97
7
0
19 Mar 2025
EDM: Efficient Deep Feature Matching
Xi Li
Tong Rao
Cihui Pan
77
0
0
07 Mar 2025
HybridNorm: Towards Stable and Efficient Transformer Training via Hybrid Normalization
Zhijian Zhuo
Yutao Zeng
Ya Wang
Sijun Zhang
Jian Yang
Xiaoqing Li
Xun Zhou
Jinwen Ma
79
0
0
06 Mar 2025
Hyper-SET: Designing Transformers via Hyperspherical Energy Minimization
Yunzhe Hu
Difan Zou
Dong Xu
107
1
0
17 Feb 2025
DrivingWorld: Constructing World Model for Autonomous Driving via Video GPT
Xiaotao Hu
Wei Yin
Mingkai Jia
Junyuan Deng
Xiaoyang Guo
Qian Zhang
Xiaoxiao Long
Ping Tan
VGen
124
14
0
31 Dec 2024
RDT-1B: a Diffusion Foundation Model for Bimanual Manipulation
Songming Liu
Lingxuan Wu
Bangguo Li
Hengkai Tan
Huayu Chen
Zhengyi Wang
Ke Xu
Hang Su
Jun Zhu
119
113
0
10 Oct 2024
Lumina-mGPT: Illuminate Flexible Photorealistic Text-to-Image Generation with Multimodal Generative Pretraining
Dongyang Liu
Shitian Zhao
Le Zhuo
Weifeng Lin
Ping Luo
Xinyue Li
Qi Qin
Yu Qiao
Hongsheng Li
Peng Gao
MLLM
143
54
0
05 Aug 2024
Normalized Attention Without Probability Cage
Oliver Richter
Roger Wattenhofer
63
21
0
19 May 2020
Stolen Probability: A Structural Weakness of Neural Language Models
David Demeter
Gregory J. Kimmel
Doug Downey
51
33
0
05 May 2020
Language Model Prior for Low-Resource Neural Machine Translation
Christos Baziotis
Barry Haddow
Alexandra Birch
49
53
0
30 Apr 2020
On Optimal Transformer Depth for Low-Resource Language Translation
Elan Van Biljon
Arnu Pretorius
Julia Kreutzer
MoE
52
27
0
09 Apr 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
139
993
0
12 Feb 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
67
111
0
25 Dec 2019
Understanding and Improving Layer Normalization
Jingjing Xu
Xu Sun
Zhiyuan Zhang
Guangxiang Zhao
Junyang Lin
FAtt
95
354
0
16 Nov 2019
BPE-Dropout: Simple and Effective Subword Regularization
Ivan Provilkov
Dmitrii Emelianenko
Elena Voita
79
286
0
29 Oct 2019
Root Mean Square Layer Normalization
Biao Zhang
Rico Sennrich
91
740
0
16 Oct 2019
Transformers without Tears: Improving the Normalization of Self-Attention
Toan Q. Nguyen
Julian Salazar
86
231
0
14 Oct 2019
What Does BERT Look At? An Analysis of BERT's Attention
Kevin Clark
Urvashi Khandelwal
Omer Levy
Christopher D. Manning
MILM
218
1,598
0
11 Jun 2019
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
76
672
0
05 Jun 2019
fairseq: A Fast, Extensible Toolkit for Sequence Modeling
Myle Ott
Sergey Edunov
Alexei Baevski
Angela Fan
Sam Gross
Nathan Ng
David Grangier
Michael Auli
VLM
FaML
111
3,151
0
01 Apr 2019
Massively Multilingual Neural Machine Translation
Roee Aharoni
Melvin Johnson
Orhan Firat
LRM
AI4CE
81
488
0
28 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
250
3,730
0
09 Jan 2019
Accelerating Neural Transformer via an Average Attention Network
Biao Zhang
Deyi Xiong
Jinsong Su
71
120
0
02 May 2018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mengzhao Chen
Orhan Firat
Ankur Bapna
Melvin Johnson
Wolfgang Macherey
...
Niki Parmar
M. Schuster
Zhifeng Chen
Yonghui Wu
Macduff Hughes
AIMat
63
457
0
26 Apr 2018
A Call for Clarity in Reporting BLEU Scores
Matt Post
162
2,994
0
23 Apr 2018
When and Why are Pre-trained Word Embeddings Useful for Neural Machine Translation?
Ye Qi
Devendra Singh Sachan
Matthieu Felix
Sarguna Padmanabhan
Graham Neubig
97
343
0
17 Apr 2018
Tensor2Tensor for Neural Machine Translation
Ashish Vaswani
Samy Bengio
E. Brevdo
François Chollet
Aidan Gomez
...
Nal Kalchbrenner
Niki Parmar
Ryan Sepassi
Noam M. Shazeer
Jakob Uszkoreit
94
530
0
16 Mar 2018
Weighted Transformer Network for Machine Translation
Karim Ahmed
N. Keskar
R. Socher
65
133
0
06 Nov 2017
Improving Lexical Choice in Neural Machine Translation
Toan Q. Nguyen
David Chiang
55
86
0
03 Oct 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
713
131,652
0
12 Jun 2017
Cosine Normalization: Using Cosine Similarity Instead of Dot Product in Neural Networks
Chunjie Luo
Jianfeng Zhan
Lei Wang
Qiang Yang
65
201
0
20 Feb 2017
OpenNMT: Open-Source Toolkit for Neural Machine Translation
Guillaume Klein
Yoon Kim
Yuntian Deng
Jean Senellart
Alexander M. Rush
330
1,900
0
10 Jan 2017
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
413
10,494
0
21 Jul 2016
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
224
7,745
0
31 Aug 2015
Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift
Sergey Ioffe
Christian Szegedy
OOD
463
43,305
0
11 Feb 2015
1