Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
40 / 1,140 papers shown
Title
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
13
621
0
13 Nov 2019
word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement
Ali (Aliakbar) Panahi
Seyran Saeedi
Tom Arodz
11
29
0
12 Nov 2019
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Zihao Ye
Qipeng Guo
Quan Gan
Xipeng Qiu
Zheng-Wei Zhang
28
77
0
11 Nov 2019
Blockwise Self-Attention for Long Document Understanding
J. Qiu
Hao Ma
Omer Levy
Scott Yih
Sinong Wang
Jie Tang
11
252
0
07 Nov 2019
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding
Pan Zhou
Ruchao Fan
Wei Chen
Jia Jia
11
26
0
01 Nov 2019
Injecting Hierarchy with U-Net Transformers
David Donahue
Vladislav Lialin
Anna Rumshisky
AI4CE
24
1
0
16 Oct 2019
Transformer ASR with Contextual Block Processing
E. Tsunoo
Yosuke Kashiwagi
Toshiyuki Kumakura
Shinji Watanabe
59
64
0
16 Oct 2019
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization
Paras Jain
Ajay Jain
Aniruddha Nrusimha
A. Gholami
Pieter Abbeel
Kurt Keutzer
Ion Stoica
Joseph E. Gonzalez
21
189
0
07 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
109
6,377
0
26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems
N. Laanait
Josh Romero
Junqi Yin
M. T. Young
Sean Treichler
V. Starchenko
A. Borisevich
Alexander Sergeev
Michael A. Matheson
FedML
BDL
35
29
0
24 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism
M. Shoeybi
M. Patwary
Raul Puri
P. LeGresley
Jared Casper
Bryan Catanzaro
MoE
245
1,833
0
17 Sep 2019
CTRL: A Conditional Transformer Language Model for Controllable Generation
N. Keskar
Bryan McCann
L. Varshney
Caiming Xiong
R. Socher
AI4CE
57
1,237
0
11 Sep 2019
Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent Data
Yong Li
J. M. F. Moura
AI4TS
27
28
0
09 Sep 2019
Deep Equilibrium Models
Shaojie Bai
J. Zico Kolter
V. Koltun
14
657
0
03 Sep 2019
Logic and the
2
2
2
-Simplicial Transformer
James Clift
D. Doryn
Daniel Murfet
James Wallbridge
NAI
18
3
0
02 Sep 2019
Adaptively Sparse Transformers
Gonçalo M. Correia
Vlad Niculae
André F. T. Martins
10
252
0
30 Aug 2019
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel
Yao-Hung Hubert Tsai
Shaojie Bai
M. Yamada
Louis-Philippe Morency
Ruslan Salakhutdinov
16
249
0
30 Aug 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention
Biao Zhang
Ivan Titov
Rico Sennrich
10
101
0
29 Aug 2019
BERT for Coreference Resolution: Baselines and Analysis
Mandar Joshi
Omer Levy
Daniel S. Weld
Luke Zettlemoyer
17
320
0
24 Aug 2019
Interlaced Sparse Self-Attention for Semantic Segmentation
Lang Huang
Yuhui Yuan
Jianyuan Guo
Chao Zhang
Xilin Chen
Jingdong Wang
19
154
0
29 Jul 2019
Self-Attentional Credit Assignment for Transfer in Reinforcement Learning
Johan Ferret
Raphaël Marinier
M. Geist
Olivier Pietquin
OffRL
26
6
0
18 Jul 2019
Agglomerative Attention
Matthew Spellings
14
0
0
15 Jul 2019
Adversarial Video Generation on Complex Datasets
Aidan Clark
Jeff Donahue
Karen Simonyan
VGen
GAN
27
74
0
15 Jul 2019
Sparse Networks from Scratch: Faster Training without Losing Performance
Tim Dettmers
Luke Zettlemoyer
20
334
0
10 Jul 2019
Augmenting Self-attention with Persistent Memory
Sainbayar Sukhbaatar
Edouard Grave
Guillaume Lample
Hervé Jégou
Armand Joulin
RALM
KELM
21
135
0
02 Jul 2019
The University of Sydney's Machine Translation System for WMT19
Liang Ding
Dacheng Tao
11
13
0
30 Jun 2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting
Shiyang Li
Xiaoyong Jin
Yao Xuan
Xiyou Zhou
Wenhu Chen
Yu-Xiang Wang
Xifeng Yan
AI4TS
26
1,384
0
29 Jun 2019
A Tensorized Transformer for Language Modeling
Xindian Ma
Peng Zhang
Shuai Zhang
Nan Duan
Yuexian Hou
D. Song
M. Zhou
18
163
0
24 Jun 2019
Learning Set-equivariant Functions with SWARM Mappings
Roland Vollgraf
13
3
0
22 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models
Michael Hahn
19
261
0
16 Jun 2019
One Epoch Is All You Need
Aran Komatsuzaki
16
50
0
16 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model
Jesse Vig
Yonatan Belinkov
30
357
0
07 Jun 2019
Scaling Autoregressive Video Models
Dirk Weissenborn
Oscar Täckström
Jakob Uszkoreit
DiffM
VGen
38
200
0
06 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain
Sean Vasquez
M. Lewis
DiffM
24
131
0
04 Jun 2019
Exploiting Uncertainty of Loss Landscape for Stochastic Optimization
Vineeth S. Bhaskara
S. Desai
11
0
0
30 May 2019
SCRAM: Spatially Coherent Randomized Attention Maps
D. A. Calian
P. Roelants
Jacques Calì
B. Carr
K. Dubba
John E. Reid
Dell Zhang
19
2
0
24 May 2019
Compression with Flows via Local Bits-Back Coding
Jonathan Ho
Evan Lohn
Pieter Abbeel
32
53
0
21 May 2019
An Attentive Survey of Attention Models
S. Chaudhari
Varun Mithal
Gungor Polatkan
R. Ramanath
30
641
0
05 Apr 2019
OCNet: Object Context Network for Scene Parsing
Yuhui Yuan
Lang Huang
Jianyuan Guo
Chao Zhang
Xilin Chen
Jingdong Wang
25
599
0
04 Sep 2018
Pixel Recurrent Neural Networks
Aaron van den Oord
Nal Kalchbrenner
Koray Kavukcuoglu
SSeg
GAN
272
2,552
0
25 Jan 2016
Previous
1
2
3
...
21
22
23