Character-Level Language Modeling with Deeper Self-Attention

9 August 2018

Papers citing "Character-Level Language Modeling with Deeper Self-Attention"

27 / 77 papers shown

Title
End-to-End Object Detection with Transformers Nicolas Carion Francisco Massa Gabriel Synnaeve Nicolas Usunier Alexander Kirillov Sergey Zagoruyko ViT 3DV PINN 134 12,711 0 26 May 2020
Multiscale Collaborative Deep Models for Neural Machine Translation Xiangpeng Wei Heng Yu Yue Hu Yue Zhang Rongxiang Weng Weihua Luo 27 28 0 29 Apr 2020
A Spatio-temporal Transformer for 3D Human Motion Prediction Emre Aksan Manuel Kaufmann Peng Cao Otmar Hilliges ViT 28 224 0 18 Apr 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks Yekun Chai Jin Shuo Xinwen Hou 25 17 0 17 Apr 2020
Longformer: The Long-Document Transformer Iz Beltagy Matthew E. Peters Arman Cohan RALM VLM 30 3,934 0 10 Apr 2020
Code Prediction by Feeding Trees to Transformers Seohyun Kim Jinman Zhao Yuchi Tian S. Chandra 43 216 0 30 Mar 2020
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection Xiaoya Li Yuxian Meng Mingxin Zhou Qinghong Han Fei Wu Jiwei Li 27 20 0 22 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth Thomas C. Bachlechner Bodhisattwa Prasad Majumder H. H. Mao G. Cottrell Julian McAuley AI4CE 30 276 0 10 Mar 2020
On Layer Normalization in the Transformer Architecture Ruibin Xiong Yunchang Yang Di He Kai Zheng Shuxin Zheng Chen Xing Huishuai Zhang Yanyan Lan Liwei Wang Tie-Yan Liu AI4CE 35 949 0 12 Feb 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection Guangxiang Zhao Junyang Lin Zhiyuan Zhang Xuancheng Ren Qi Su Xu Sun 22 108 0 25 Dec 2019
Single Headed Attention RNN: Stop Thinking With Your Head Stephen Merity 27 68 0 26 Nov 2019
Understanding and Improving Layer Normalization Jingjing Xu Xu Sun Zhiyuan Zhang Guangxiang Zhao Junyang Lin FAtt 35 342 0 16 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling Jack W. Rae Anna Potapenko Siddhant M. Jayakumar Timothy Lillicrap RALM VLM KELM 13 621 0 13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer Colin Raffel Noam M. Shazeer Adam Roberts Katherine Lee Sharan Narang Michael Matena Yanqi Zhou Wei Li Peter J. Liu AIMat 129 19,529 0 23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks Andros Tjandra Chunxi Liu Frank Zhang Xiaohui Zhang Yongqiang Wang Gabriel Synnaeve Satoshi Nakamura Geoffrey Zweig ViT 25 44 0 23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition Yongqiang Wang Abdel-rahman Mohamed Duc Le Chunxi Liu Alex Xiao ... Xiaohui Zhang Frank Zhang Christian Fuegen Geoffrey Zweig M. Seltzer 16 248 0 22 Oct 2019
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention Junyeop Lee Sungrae Park Jeonghun Baek Seong Joon Oh Seonghyeon Kim Hwalsuk Lee 298 121 0 10 Oct 2019
Investigating Self-Attention Network for Chinese Word Segmentation Leilei Gan Yue Zhang 21 11 0 26 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer Z. Wang Yao Ma Zitao Liu Jiliang Tang ViT 19 105 0 12 Jul 2019
Language Modeling with Deep Transformers Kazuki Irie Albert Zeyer Ralf Schluter Hermann Ney KELM 43 171 0 10 May 2019
Generating Long Sequences with Sparse Transformers R. Child Scott Gray Alec Radford Ilya Sutskever 28 1,851 0 23 Apr 2019
Latent Normalizing Flows for Discrete Sequences Zachary M. Ziegler Alexander M. Rush BDL DRL 27 122 0 29 Jan 2019
Cross-lingual Language Model Pretraining Guillaume Lample Alexis Conneau 25 2,709 0 22 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context Zihang Dai Zhilin Yang Yiming Yang J. Carbonell Quoc V. Le Ruslan Salakhutdinov VLM 38 3,679 0 09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding Jacob Devlin Ming-Wei Chang Kenton Lee Kristina Toutanova VLM SSL SSeg 88 93,140 0 11 Oct 2018
Adaptive Input Representations for Neural Language Modeling Alexei Baevski Michael Auli 29 388 0 28 Sep 2018
Neural Architecture Search with Reinforcement Learning Barret Zoph Quoc V. Le 274 5,330 0 05 Nov 2016