Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2006.04710
Cited By
The Lipschitz Constant of Self-Attention
8 June 2020
Hyunjik Kim
George Papamakarios
A. Mnih
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Lipschitz Constant of Self-Attention"
30 / 30 papers shown
Title
Approximation theory for 1-Lipschitz ResNets
Davide Murari
Takashi Furuya
Carola-Bibiane Schönlieb
41
0
0
17 May 2025
Provably Overwhelming Transformer Models with Designed Inputs
Lev Stambler
Seyed Sajjad Nezhadi
Matthew Coudron
100
1
0
09 Feb 2025
Law of Vision Representation in MLLMs
Shijia Yang
Bohan Zhai
Quanzeng You
Jianbo Yuan
Hongxia Yang
Chenfeng Xu
68
10
0
29 Aug 2024
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
500
41,106
0
28 May 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
95
973
0
12 Feb 2020
Stabilizing Transformers for Reinforcement Learning
Emilio Parisotto
H. F. Song
Jack W. Rae
Razvan Pascanu
Çağlar Gülçehre
...
Aidan Clark
Seb Noury
M. Botvinick
N. Heess
R. Hadsell
OffRL
69
360
0
13 Oct 2019
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
91
251
0
30 Aug 2019
Stand-Alone Self-Attention in Vision Models
Prajit Ramachandran
Niki Parmar
Ashish Vaswani
Irwan Bello
Anselm Levskaya
Jonathon Shlens
VLM
SLR
ViT
65
1,208
0
13 Jun 2019
Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks
Mahyar Fazlyab
Alexander Robey
Hamed Hassani
M. Morari
George J. Pappas
69
451
0
12 Jun 2019
Residual Flows for Invertible Generative Modeling
Ricky T. Q. Chen
Jens Behrmann
David Duvenaud
J. Jacobsen
BDL
TPM
DRL
56
375
0
06 Jun 2019
Learning Deep Transformer Models for Machine Translation
Qiang Wang
Bei Li
Tong Xiao
Jingbo Zhu
Changliang Li
Derek F. Wong
Lidia S. Chao
59
666
0
05 Jun 2019
Sorting out Lipschitz function approximation
Cem Anil
James Lucas
Roger C. Grosse
57
319
0
13 Nov 2018
Invertible Residual Networks
Jens Behrmann
Will Grathwohl
Ricky T. Q. Chen
David Duvenaud
J. Jacobsen
UQCV
TPM
73
621
0
02 Nov 2018
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
961
93,936
0
11 Oct 2018
FFJORD: Free-form Continuous Dynamics for Scalable Reversible Generative Models
Will Grathwohl
Ricky T. Q. Chen
J. Bettencourt
Ilya Sutskever
David Duvenaud
DRL
66
861
0
02 Oct 2018
Large Scale GAN Training for High Fidelity Natural Image Synthesis
Andrew Brock
Jeff Donahue
Karen Simonyan
221
5,363
0
28 Sep 2018
Training Deeper Neural Machine Translation Models with Transparent Attention
Ankur Bapna
Mengzhao Chen
Orhan Firat
Yuan Cao
Yonghui Wu
48
139
0
22 Aug 2018
Glow: Generative Flow with Invertible 1x1 Convolutions
Diederik P. Kingma
Prafulla Dhariwal
BDL
DRL
212
3,110
0
09 Jul 2018
Neural Ordinary Differential Equations
T. Chen
Yulia Rubanova
J. Bettencourt
David Duvenaud
AI4CE
232
5,024
0
19 Jun 2018
Lipschitz regularity of deep neural networks: analysis and efficient estimation
Kevin Scaman
Aladin Virmaux
62
523
0
28 May 2018
Self-Attention Generative Adversarial Networks
Han Zhang
Ian Goodfellow
Dimitris N. Metaxas
Augustus Odena
GAN
113
3,710
0
21 May 2018
Computational Optimal Transport
Gabriel Peyré
Marco Cuturi
OT
140
2,133
0
01 Mar 2018
Spectral Normalization for Generative Adversarial Networks
Takeru Miyato
Toshiki Kataoka
Masanori Koyama
Yuichi Yoshida
ODL
137
4,421
0
16 Feb 2018
Lipschitz-Margin Training: Scalable Certification of Perturbation Invariance for Deep Neural Networks
Yusuke Tsuzuku
Issei Sato
Masashi Sugiyama
AAML
78
301
0
12 Feb 2018
Non-local Neural Networks
Xinyu Wang
Ross B. Girshick
Abhinav Gupta
Kaiming He
OffRL
215
8,867
0
21 Nov 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
453
129,831
0
12 Jun 2017
Parseval Networks: Improving Robustness to Adversarial Examples
Moustapha Cissé
Piotr Bojanowski
Edouard Grave
Yann N. Dauphin
Nicolas Usunier
AAML
120
800
0
28 Apr 2017
Robust Large Margin Deep Neural Networks
Jure Sokolić
Raja Giryes
Guillermo Sapiro
M. Rodrigues
59
309
0
26 May 2016
TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems
Martín Abadi
Ashish Agarwal
P. Barham
E. Brevdo
Zhiwen Chen
...
Pete Warden
Martin Wattenberg
Martin Wicke
Yuan Yu
Xiaoqiang Zheng
189
11,135
0
14 Mar 2016
Adam: A Method for Stochastic Optimization
Diederik P. Kingma
Jimmy Ba
ODL
842
149,474
0
22 Dec 2014
1