ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1808.04444
  4. Cited By
Character-Level Language Modeling with Deeper Self-Attention

Character-Level Language Modeling with Deeper Self-Attention

9 August 2018
Rami Al-Rfou
Dokook Choe
Noah Constant
Mandy Guo
Llion Jones
ArXivPDFHTML

Papers citing "Character-Level Language Modeling with Deeper Self-Attention"

27 / 77 papers shown
Title
End-to-End Object Detection with Transformers
End-to-End Object Detection with Transformers
Nicolas Carion
Francisco Massa
Gabriel Synnaeve
Nicolas Usunier
Alexander Kirillov
Sergey Zagoruyko
ViT
3DV
PINN
134
12,711
0
26 May 2020
Multiscale Collaborative Deep Models for Neural Machine Translation
Multiscale Collaborative Deep Models for Neural Machine Translation
Xiangpeng Wei
Heng Yu
Yue Hu
Yue Zhang
Rongxiang Weng
Weihua Luo
27
28
0
29 Apr 2020
A Spatio-temporal Transformer for 3D Human Motion Prediction
A Spatio-temporal Transformer for 3D Human Motion Prediction
Emre Aksan
Manuel Kaufmann
Peng Cao
Otmar Hilliges
ViT
28
224
0
18 Apr 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Yekun Chai
Jin Shuo
Xinwen Hou
25
17
0
17 Apr 2020
Longformer: The Long-Document Transformer
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
30
3,934
0
10 Apr 2020
Code Prediction by Feeding Trees to Transformers
Code Prediction by Feeding Trees to Transformers
Seohyun Kim
Jinman Zhao
Yuchi Tian
S. Chandra
43
216
0
30 Mar 2020
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive
  Connection
SAC: Accelerating and Structuring Self-Attention via Sparse Adaptive Connection
Xiaoya Li
Yuxian Meng
Mingxin Zhou
Qinghong Han
Fei Wu
Jiwei Li
27
20
0
22 Mar 2020
ReZero is All You Need: Fast Convergence at Large Depth
ReZero is All You Need: Fast Convergence at Large Depth
Thomas C. Bachlechner
Bodhisattwa Prasad Majumder
H. H. Mao
G. Cottrell
Julian McAuley
AI4CE
30
276
0
10 Mar 2020
On Layer Normalization in the Transformer Architecture
On Layer Normalization in the Transformer Architecture
Ruibin Xiong
Yunchang Yang
Di He
Kai Zheng
Shuxin Zheng
Chen Xing
Huishuai Zhang
Yanyan Lan
Liwei Wang
Tie-Yan Liu
AI4CE
35
949
0
12 Feb 2020
Explicit Sparse Transformer: Concentrated Attention Through Explicit
  Selection
Explicit Sparse Transformer: Concentrated Attention Through Explicit Selection
Guangxiang Zhao
Junyang Lin
Zhiyuan Zhang
Xuancheng Ren
Qi Su
Xu Sun
22
108
0
25 Dec 2019
Single Headed Attention RNN: Stop Thinking With Your Head
Single Headed Attention RNN: Stop Thinking With Your Head
Stephen Merity
27
68
0
26 Nov 2019
Understanding and Improving Layer Normalization
Understanding and Improving Layer Normalization
Jingjing Xu
Xu Sun
Zhiyuan Zhang
Guangxiang Zhao
Junyang Lin
FAtt
35
342
0
16 Nov 2019
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
13
621
0
13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
129
19,529
0
23 Oct 2019
Deja-vu: Double Feature Presentation and Iterated Loss in Deep
  Transformer Networks
Deja-vu: Double Feature Presentation and Iterated Loss in Deep Transformer Networks
Andros Tjandra
Chunxi Liu
Frank Zhang
Xiaohui Zhang
Yongqiang Wang
Gabriel Synnaeve
Satoshi Nakamura
Geoffrey Zweig
ViT
25
44
0
23 Oct 2019
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Transformer-based Acoustic Modeling for Hybrid Speech Recognition
Yongqiang Wang
Abdel-rahman Mohamed
Duc Le
Chunxi Liu
Alex Xiao
...
Xiaohui Zhang
Frank Zhang
Christian Fuegen
Geoffrey Zweig
M. Seltzer
16
248
0
22 Oct 2019
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
On Recognizing Texts of Arbitrary Shapes with 2D Self-Attention
Junyeop Lee
Sungrae Park
Jeonghun Baek
Seong Joon Oh
Seonghyeon Kim
Hwalsuk Lee
298
121
0
10 Oct 2019
Investigating Self-Attention Network for Chinese Word Segmentation
Investigating Self-Attention Network for Chinese Word Segmentation
Leilei Gan
Yue Zhang
21
11
0
26 Jul 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Jiliang Tang
ViT
19
105
0
12 Jul 2019
Language Modeling with Deep Transformers
Language Modeling with Deep Transformers
Kazuki Irie
Albert Zeyer
Ralf Schluter
Hermann Ney
KELM
43
171
0
10 May 2019
Generating Long Sequences with Sparse Transformers
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
28
1,851
0
23 Apr 2019
Latent Normalizing Flows for Discrete Sequences
Latent Normalizing Flows for Discrete Sequences
Zachary M. Ziegler
Alexander M. Rush
BDL
DRL
27
122
0
29 Jan 2019
Cross-lingual Language Model Pretraining
Cross-lingual Language Model Pretraining
Guillaume Lample
Alexis Conneau
25
2,709
0
22 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
38
3,679
0
09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
88
93,140
0
11 Oct 2018
Adaptive Input Representations for Neural Language Modeling
Adaptive Input Representations for Neural Language Modeling
Alexei Baevski
Michael Auli
29
388
0
28 Sep 2018
Neural Architecture Search with Reinforcement Learning
Neural Architecture Search with Reinforcement Learning
Barret Zoph
Quoc V. Le
274
5,330
0
05 Nov 2016
Previous
12