Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2107.05264
Cited By
The Brownian motion in the transformer model
12 July 2021
Yingshi Chen
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"The Brownian motion in the transformer model"
14 / 14 papers shown
Title
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers
Andreas Steiner
Alexander Kolesnikov
Xiaohua Zhai
Ross Wightman
Jakob Uszkoreit
Lucas Beyer
ViT
131
635
0
18 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision
Ilya O. Tolstikhin
N. Houlsby
Alexander Kolesnikov
Lucas Beyer
Xiaohua Zhai
...
Andreas Steiner
Daniel Keysers
Jakob Uszkoreit
Mario Lucic
Alexey Dosovitskiy
441
2,689
0
04 May 2021
An iterative K-FAC algorithm for Deep Learning
Yingshi Chen
ODL
61
1
0
01 Jan 2021
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods?
Jasmijn Bastings
Katja Filippova
XAI
LRM
97
178
0
12 Oct 2020
New Interpretations of Normalization Methods in Deep Learning
Jiacheng Sun
Xiangyong Cao
Hanwen Liang
Weiran Huang
Zewei Chen
Zhenguo Li
59
35
0
16 Jun 2020
Visual Transformers: Token-based Image Representation and Processing for Computer Vision
Bichen Wu
Chenfeng Xu
Xiaoliang Dai
Alvin Wan
Peizhao Zhang
Zhicheng Yan
Masayoshi Tomizuka
Joseph E. Gonzalez
Kurt Keutzer
Peter Vajda
ViT
101
562
0
05 Jun 2020
A Primer in BERTology: What we know about how BERT works
Anna Rogers
Olga Kovaleva
Anna Rumshisky
OffRL
97
1,503
0
27 Feb 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
151
996
0
12 Feb 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Victor Sanh
Lysandre Debut
Julien Chaumond
Thomas Wolf
257
7,554
0
02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
373
6,469
0
26 Sep 2019
Attention is not not Explanation
Sarah Wiegreffe
Yuval Pinter
XAI
AAML
FAtt
124
914
0
13 Aug 2019
Are Sixteen Heads Really Better than One?
Paul Michel
Omer Levy
Graham Neubig
MoE
109
1,069
0
25 May 2019
Attention is not Explanation
Sarthak Jain
Byron C. Wallace
FAtt
148
1,329
0
26 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
260
3,747
0
09 Jan 2019
1