Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1904.10509
Cited By
Generating Long Sequences with Sparse Transformers
23 April 2019
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Generating Long Sequences with Sparse Transformers"
50 / 1,140 papers shown
Title
Summarize, Outline, and Elaborate: Long-Text Generation via Hierarchical Supervision from Extractive Summaries
Xiaofei Sun
Zijun Sun
Yuxian Meng
Jiwei Li
Chun Fan
11
18
0
14 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
22
49
0
14 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
28
21
0
12 Oct 2020
SMYRF: Efficient Attention using Asymmetric Clustering
Giannis Daras
Nikita Kitaev
Augustus Odena
A. Dimakis
31
44
0
11 Oct 2020
Deformable DETR: Deformable Transformers for End-to-End Object Detection
Xizhou Zhu
Weijie Su
Lewei Lu
Bin Li
Xiaogang Wang
Jifeng Dai
ViT
84
4,930
0
08 Oct 2020
Vector-Vector-Matrix Architecture: A Novel Hardware-Aware Framework for Low-Latency Inference in NLP Applications
Matthew Khoury
Rumen Dangovski
L. Ou
Preslav Nakov
Yichen Shen
L. Jing
15
0
0
06 Oct 2020
Stepwise Extractive Summarization and Planning with Structured Transformers
Shashi Narayan
Joshua Maynez
Jakub Adamek
Daniele Pighin
Blavz Bratanivc
Ryan T. McDonald
30
30
0
06 Oct 2020
Scene Graph Modification Based on Natural Language Commands
Xuanli He
Quan Hung Tran
Gholamreza Haffari
Walter Chang
Trung Bui
Zhe-nan Lin
Franck Dernoncourt
Nhan Dam
GNN
44
9
0
06 Oct 2020
Guiding Attention for Self-Supervised Learning with Transformers
Ameet Deshpande
Karthik R. Narasimhan
34
21
0
06 Oct 2020
Which *BERT? A Survey Organizing Contextualized Encoders
Patrick Xia
Shijie Wu
Benjamin Van Durme
26
50
0
02 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
63
1,522
0
30 Sep 2020
Learning Hard Retrieval Decoder Attention for Transformers
Hongfei Xu
Qiuhui Liu
Josef van Genabith
Deyi Xiong
13
1
0
30 Sep 2020
Learning Knowledge Bases with Parameters for Task-Oriented Dialogue Systems
Andrea Madotto
Samuel Cahyawijaya
Genta Indra Winata
Yan Xu
Zihan Liu
Zhaojiang Lin
Pascale Fung
36
59
0
28 Sep 2020
Current Limitations of Language Models: What You Need is Retrieval
Aran Komatsuzaki
LRM
14
3
0
15 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
114
1,103
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
43
28
0
13 Sep 2020
Sparsifying Transformer Models with Trainable Representation Pooling
Michal Pietruszka
Łukasz Borchmann
Lukasz Garncarek
23
10
0
10 Sep 2020
Multi-Attention-Network for Semantic Segmentation of Fine Resolution Remote Sensing Images
Rui Li
Shunyi Zheng
Chenxi Duan
Ce Zhang
Jianlin Su
P. M. Atkinson
SSeg
35
372
0
03 Sep 2020
Accelerating Sparse DNN Models without Hardware-Support via Tile-Wise Sparsity
Cong Guo
B. Hsueh
Jingwen Leng
Yuxian Qiu
Yue Guan
Zehuan Wang
Xiaoying Jia
Xipeng Li
M. Guo
Yuhao Zhu
35
83
0
29 Aug 2020
Deep Spatial Transformation for Pose-Guided Person Image Generation and Animation
Yurui Ren
Ge Li
Shan Liu
Thomas H. Li
3DH
31
65
0
27 Aug 2020
AMBERT: A Pre-trained Language Model with Multi-Grained Tokenization
Xinsong Zhang
Pengshuai Li
Hang Li
16
51
0
27 Aug 2020
Generating Music with a Self-Correcting Non-Chronological Autoregressive Model
Wayne Chi
Prachi Kumar
Suri Yaddanapudi
Rahul Suresh
Umut Isik
KELM
16
10
0
18 Aug 2020
PopMAG: Pop Music Accompaniment Generation
Yi Ren
Jinzheng He
Xu Tan
Tao Qin
Zhou Zhao
Tie-Yan Liu
33
115
0
18 Aug 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
54
491
0
17 Aug 2020
Adding Recurrence to Pretrained Transformers for Improved Efficiency and Context Size
Davis Yoshida
Allyson Ettinger
Kevin Gimpel
AI4CE
16
7
0
16 Aug 2020
Compression of Deep Learning Models for Text: A Survey
Manish Gupta
Puneet Agrawal
VLM
MedIm
AI4CE
22
115
0
12 Aug 2020
DeLighT: Deep and Light-weight Transformer
Sachin Mehta
Marjan Ghazvininejad
Srini Iyer
Luke Zettlemoyer
Hannaneh Hajishirzi
VLM
33
32
0
03 Aug 2020
The Chess Transformer: Mastering Play using Generative Language Models
David Noever
Matt Ciolino
Josh Kalin
12
37
0
02 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
45
30
0
31 Jul 2020
Linear Attention Mechanism: An Efficient Attention for Semantic Segmentation
Rui Li
Jianlin Su
Chenxi Duan
Shunyi Zheng
3DV
20
35
0
29 Jul 2020
TensorCoder: Dimension-Wise Attention via Tensor Representation for Natural Language Modeling
Shuai Zhang
Peng Zhang
Xindian Ma
Junqiu Wei
Ning Wang
Qun Liu
19
5
0
28 Jul 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
288
2,023
0
28 Jul 2020
Cloud Transformers: A Universal Approach To Point Cloud Processing Tasks
Kirill Mazur
Victor Lempitsky
3DPC
51
39
0
22 Jul 2020
DeepSVG: A Hierarchical Generative Network for Vector Graphics Animation
Alexandre Carlier
Martin Danelljan
Alexandre Alahi
Radu Timofte
116
138
0
22 Jul 2020
Conformer-Kernel with Query Term Independence for Document Retrieval
Bhaskar Mitra
Sebastian Hofstatter
Hamed Zamani
Nick Craswell
27
21
0
20 Jul 2020
Autoregressive Unsupervised Image Segmentation
Yassine Ouali
C´eline Hudelot
Myriam Tami
SSL
35
86
0
16 Jul 2020
ProtTrans: Towards Cracking the Language of Life's Code Through Self-Supervised Deep Learning and High Performance Computing
Ahmed Elnaggar
M. Heinzinger
Christian Dallago
Ghalia Rehawi
Yu Wang
...
Tamas B. Fehér
Christoph Angerer
Martin Steinegger
D. Bhowmik
B. Rost
DRL
20
917
0
13 Jul 2020
Sparse Graph to Sequence Learning for Vision Conditioned Long Textual Sequence Generation
Aditya Mogadala
Marius Mosbach
Dietrich Klakow
VLM
151
0
0
12 Jul 2020
Variable Skipping for Autoregressive Range Density Estimation
Eric Liang
Zongheng Yang
Ion Stoica
Pieter Abbeel
Yan Duan
Xi Chen
24
4
0
10 Jul 2020
Fast Transformers with Clustered Attention
Apoorv Vyas
Angelos Katharopoulos
Franccois Fleuret
13
151
0
09 Jul 2020
Data Movement Is All You Need: A Case Study on Optimizing Transformers
A. Ivanov
Nikoli Dryden
Tal Ben-Nun
Shigang Li
Torsten Hoefler
36
131
0
30 Jun 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
65
1,674
0
29 Jun 2020
Matrix Shuffle-Exchange Networks for Hard 2D Tasks
Emīls Ozoliņš
Kārlis Freivalds
A. Sostaks
18
0
0
29 Jun 2020
Streaming Transformer ASR with Blockwise Synchronous Beam Search
E. Tsunoo
Yosuke Kashiwagi
Shinji Watanabe
22
11
0
25 Jun 2020
Locally Masked Convolution for Autoregressive Models
Ajay Jain
Pieter Abbeel
Deepak Pathak
DiffM
OffRL
39
31
0
22 Jun 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
23
64
0
20 Jun 2020
Denoising Diffusion Probabilistic Models
Jonathan Ho
Ajay Jain
Pieter Abbeel
DiffM
118
17,084
0
19 Jun 2020
Sparse GPU Kernels for Deep Learning
Trevor Gale
Matei A. Zaharia
C. Young
Erich Elsen
17
230
0
18 Jun 2020
A Tutorial on VAEs: From Bayes' Rule to Lossless Compression
Ronald Yu
BDL
15
23
0
18 Jun 2020
SEAL: Segment-wise Extractive-Abstractive Long-form Text Summarization
Yao-Min Zhao
Mohammad Saleh
Peter J. Liu
RALM
37
25
0
18 Jun 2020
Previous
1
2
3
...
20
21
22
23
Next