The Brownian motion in the transformer model

12 July 2021

Papers citing "The Brownian motion in the transformer model"

14 / 14 papers shown

Title
How to train your ViT? Data, Augmentation, and Regularization in Vision Transformers Andreas Steiner Alexander Kolesnikov Xiaohua Zhai Ross Wightman Jakob Uszkoreit Lucas Beyer ViT 131 635 0 18 Jun 2021
MLP-Mixer: An all-MLP Architecture for Vision Ilya O. Tolstikhin N. Houlsby Alexander Kolesnikov Lucas Beyer Xiaohua Zhai ... Andreas Steiner Daniel Keysers Jakob Uszkoreit Mario Lucic Alexey Dosovitskiy 441 2,689 0 04 May 2021
An iterative K-FAC algorithm for Deep Learning Yingshi Chen ODL 61 1 0 01 Jan 2021
The elephant in the interpretability room: Why use attention as explanation when we have saliency methods? Jasmijn Bastings Katja Filippova XAI LRM 97 178 0 12 Oct 2020
New Interpretations of Normalization Methods in Deep Learning Jiacheng Sun Xiangyong Cao Hanwen Liang Weiran Huang Zewei Chen Zhenguo Li 59 35 0 16 Jun 2020
Visual Transformers: Token-based Image Representation and Processing for Computer Vision Bichen Wu Chenfeng Xu Xiaoliang Dai Alvin Wan Peizhao Zhang Zhicheng Yan Masayoshi Tomizuka Joseph E. Gonzalez Kurt Keutzer Peter Vajda ViT 101 562 0 05 Jun 2020
A Primer in BERTology: What we know about how BERT works Anna Rogers Olga Kovaleva Anna Rumshisky OffRL 97 1,503 0 27 Feb 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 151 996 0 12 Feb 2020
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter Victor Sanh Lysandre Debut Julien Chaumond Thomas Wolf 257 7,554 0 02 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma Radu Soricut SSL AIMat 373 6,469 0 26 Sep 2019
Attention is not not Explanation Sarah Wiegreffe Yuval Pinter XAI AAML FAtt 124 914 0 13 Aug 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 109 1,069 0 25 May 2019
Attention is not Explanation Sarthak Jain Byron C. Wallace FAtt 148 1,329 0 26 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 260 3,747 0 09 Jan 2019