Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1808.04444
Cited By
Character-Level Language Modeling with Deeper Self-Attention
9 August 2018
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Character-Level Language Modeling with Deeper Self-Attention"
27 / 77 papers shown
Title
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
134
12,711
0
26 May 2020
Multiscale Collaborative Deep Models for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Yue Zhang
Rongxiang Weng
Weihua Luo
27
28
0
29 Apr 2020
A Spatio-temporal Transformer for 3D Human Motion Prediction
Emre Aksan
Manuel Kaufmann
Peng Cao
Otmar Hilliges
ViT
28
224
0
18 Apr 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Yekun Chai
Jin Shuo
Xinwen Hou
25
17
0
17 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
30
3,934
0
10 Apr 2020
Code Prediction by Feeding Trees to Transformers
Seohyun Kim
Jinman Zhao
Yuchi Tian
S. Chandra
43
216
0
30 Mar 2020
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Xiaoya Li
Yuxian Meng
Mingxin Zhou
Qinghong Han
Fei Wu
Jiwei Li
27
20
0
22 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
30
276
0
10 Mar 2020
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
35
949
0
12 Feb 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
27
68
0
26 Nov 2019
Understanding and Improving Layer Normalization
Jingjing Xu
Xu Sun
Zhiyuan Zhang
Guangxiang Zhao
Junyang Lin
FAtt
35
342
0
16 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
13
621
0
13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
129
19,529
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
25
44
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
16
248
0
22 Oct 2019
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
Junyeop Lee
Sungrae Park
Jeonghun Baek
Seong Joon Oh
Seonghyeon Kim
Hwalsuk Lee
298
121
0
10 Oct 2019
Investigating Self-Attention Network for Chinese Word Segmentation
Leilei Gan
Yue Zhang
21
11
0
26 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Jiliang Tang
ViT
19
105
0
12 Jul 2019
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
43
171
0
10 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
28
1,851
0
23 Apr 2019
Latent Normalizing Flows for Discrete Sequences
Zachary M. Ziegler
Alexander M. Rush
BDL
DRL
27
122
0
29 Jan 2019
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
25
2,709
0
22 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
38
3,679
0
09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
88
93,140
0
11 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
29
388
0
28 Sep 2018
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
274
5,330
0
05 Nov 2016
Previous
1
2