Generating Long Sequences with Sparse Transformers

23 April 2019

Papers citing "Generating Long Sequences with Sparse Transformers"

40 / 1,140 papers shown

Title
Compressive Transformers for Long-Range Sequence Modelling Jack W. Rae Anna Potapenko Siddhant M. Jayakumar Timothy Lillicrap RALM VLM KELM 13 621 0 13 Nov 2019
word2ket: Space-efficient Word Embeddings inspired by Quantum Entanglement Ali (Aliakbar) Panahi Seyran Saeedi Tom Arodz 11 29 0 12 Nov 2019
BP-Transformer: Modelling Long-Range Context via Binary Partitioning Zihao Ye Qipeng Guo Quan Gan Xipeng Qiu Zheng-Wei Zhang 28 77 0 11 Nov 2019
Blockwise Self-Attention for Long Document Understanding J. Qiu Hao Ma Omer Levy Scott Yih Sinong Wang Jie Tang 11 252 0 07 Nov 2019
Improving Generalization of Transformer for Speech Recognition with Parallel Schedule Sampling and Relative Positional Embedding Pan Zhou Ruchao Fan Wei Chen Jia Jia 11 26 0 01 Nov 2019
Injecting Hierarchy with U-Net Transformers David Donahue Vladislav Lialin Anna Rumshisky AI4CE 24 1 0 16 Oct 2019
Transformer ASR with Contextual Block Processing E. Tsunoo Yosuke Kashiwagi Toshiyuki Kumakura Shinji Watanabe 59 64 0 16 Oct 2019
Checkmate: Breaking the Memory Wall with Optimal Tensor Rematerialization Paras Jain Ajay Jain Aniruddha Nrusimha A. Gholami Pieter Abbeel Kurt Keutzer Ion Stoica Joseph E. Gonzalez 21 189 0 07 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations Zhenzhong Lan Mingda Chen Sebastian Goodman Kevin Gimpel Piyush Sharma Radu Soricut SSL AIMat 109 6,377 0 26 Sep 2019
Exascale Deep Learning for Scientific Inverse Problems N. Laanait Josh Romero Junqi Yin M. T. Young Sean Treichler V. Starchenko A. Borisevich Alexander Sergeev Michael A. Matheson FedML BDL 35 29 0 24 Sep 2019
Megatron-LM: Training Multi-Billion Parameter Language Models Using Model Parallelism M. Shoeybi M. Patwary Raul Puri P. LeGresley Jared Casper Bryan Catanzaro MoE 245 1,833 0 17 Sep 2019
CTRL: A Conditional Transformer Language Model for Controllable Generation N. Keskar Bryan McCann L. Varshney Caiming Xiong R. Socher AI4CE 57 1,237 0 11 Sep 2019
Forecaster: A Graph Transformer for Forecasting Spatial and Time-Dependent Data Yong Li J. M. F. Moura AI4TS 27 28 0 09 Sep 2019
Deep Equilibrium Models Shaojie Bai J. Zico Kolter V. Koltun 14 657 0 03 Sep 2019
Logic and the $2$ -Simplicial Transformer James Clift D. Doryn Daniel Murfet James Wallbridge NAI 18 3 0 02 Sep 2019
Adaptively Sparse Transformers Gonçalo M. Correia Vlad Niculae André F. T. Martins 10 252 0 30 Aug 2019
Transformer Dissection: A Unified Understanding of Transformer's Attention via the Lens of Kernel Yao-Hung Hubert Tsai Shaojie Bai M. Yamada Louis-Philippe Morency Ruslan Salakhutdinov 16 249 0 30 Aug 2019
Improving Deep Transformer with Depth-Scaled Initialization and Merged Attention Biao Zhang Ivan Titov Rico Sennrich 10 101 0 29 Aug 2019
BERT for Coreference Resolution: Baselines and Analysis Mandar Joshi Omer Levy Daniel S. Weld Luke Zettlemoyer 17 320 0 24 Aug 2019
Interlaced Sparse Self-Attention for Semantic Segmentation Lang Huang Yuhui Yuan Jianyuan Guo Chao Zhang Xilin Chen Jingdong Wang 19 154 0 29 Jul 2019
Self-Attentional Credit Assignment for Transfer in Reinforcement Learning Johan Ferret Raphaël Marinier M. Geist Olivier Pietquin OffRL 26 6 0 18 Jul 2019
Agglomerative Attention Matthew Spellings 14 0 0 15 Jul 2019
Adversarial Video Generation on Complex Datasets Aidan Clark Jeff Donahue Karen Simonyan VGen GAN 27 74 0 15 Jul 2019
Sparse Networks from Scratch: Faster Training without Losing Performance Tim Dettmers Luke Zettlemoyer 20 334 0 10 Jul 2019
Augmenting Self-attention with Persistent Memory Sainbayar Sukhbaatar Edouard Grave Guillaume Lample Hervé Jégou Armand Joulin RALM KELM 21 135 0 02 Jul 2019
The University of Sydney's Machine Translation System for WMT19 Liang Ding Dacheng Tao 11 13 0 30 Jun 2019
Enhancing the Locality and Breaking the Memory Bottleneck of Transformer on Time Series Forecasting Shiyang Li Xiaoyong Jin Yao Xuan Xiyou Zhou Wenhu Chen Yu-Xiang Wang Xifeng Yan AI4TS 26 1,384 0 29 Jun 2019
A Tensorized Transformer for Language Modeling Xindian Ma Peng Zhang Shuai Zhang Nan Duan Yuexian Hou D. Song M. Zhou 18 163 0 24 Jun 2019
Learning Set-equivariant Functions with SWARM Mappings Roland Vollgraf 13 3 0 22 Jun 2019
Theoretical Limitations of Self-Attention in Neural Sequence Models Michael Hahn 19 261 0 16 Jun 2019
One Epoch Is All You Need Aran Komatsuzaki 16 50 0 16 Jun 2019
Analyzing the Structure of Attention in a Transformer Language Model Jesse Vig Yonatan Belinkov 30 357 0 07 Jun 2019
Scaling Autoregressive Video Models Dirk Weissenborn Oscar Täckström Jakob Uszkoreit DiffM VGen 38 200 0 06 Jun 2019
MelNet: A Generative Model for Audio in the Frequency Domain Sean Vasquez M. Lewis DiffM 24 131 0 04 Jun 2019
Exploiting Uncertainty of Loss Landscape for Stochastic Optimization Vineeth S. Bhaskara S. Desai 11 0 0 30 May 2019
SCRAM: Spatially Coherent Randomized Attention Maps D. A. Calian P. Roelants Jacques Calì B. Carr K. Dubba John E. Reid Dell Zhang 19 2 0 24 May 2019
Compression with Flows via Local Bits-Back Coding Jonathan Ho Evan Lohn Pieter Abbeel 32 53 0 21 May 2019
An Attentive Survey of Attention Models S. Chaudhari Varun Mithal Gungor Polatkan R. Ramanath 30 641 0 05 Apr 2019
OCNet: Object Context Network for Scene Parsing Yuhui Yuan Lang Huang Jianyuan Guo Chao Zhang Xilin Chen Jingdong Wang 25 599 0 04 Sep 2018
Pixel Recurrent Neural Networks Aaron van den Oord Nal Kalchbrenner Koray Kavukcuoglu SSeg GAN 272 2,552 0 25 Jan 2016