Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1911.05507
Cited By
Compressive Transformers for Long-Range Sequence Modelling
13 November 2019
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Compressive Transformers for Long-Range Sequence Modelling"
32 / 232 papers shown
Title
Adaptive Semiparametric Language Models
Dani Yogatama
Cyprien de Masson dÁutume
Lingpeng Kong
KELM
RALM
105
100
0
04 Feb 2021
Can We Automate Scientific Reviewing?
Weizhe Yuan
Pengfei Liu
Graham Neubig
161
90
0
30 Jan 2021
Compound Word Transformer: Learning to Compose Full-Song Music over Dynamic Directed Hypergraphs
Wen-Yi Hsiao
Jen-Yu Liu
Yin-Cheng Yeh
Yi-Hsuan Yang
193
187
0
07 Jan 2021
Shortformer: Better Language Modeling using Shorter Inputs
Ofir Press
Noah A. Smith
M. Lewis
309
91
0
31 Dec 2020
ERNIE-Doc: A Retrospective Long-Document Modeling Transformer
Siyu Ding
Junyuan Shang
Shuohuan Wang
Yu Sun
Hao Tian
Hua Wu
Haifeng Wang
116
55
0
31 Dec 2020
Rethinking Document-level Neural Machine Translation
Zewei Sun
Mingxuan Wang
Hao Zhou
Chengqi Zhao
Shujian Huang
Jiajun Chen
Lei Li
VLM
149
48
0
18 Oct 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
73
49
0
14 Oct 2020
Zero-shot Entity Linking with Efficient Long Range Sequence Modeling
Zonghai Yao
Liangliang Cao
Huapu Pan
VLM
105
21
0
12 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
198
1,605
0
30 Sep 2020
Automated Source Code Generation and Auto-completion Using Deep Learning: Comparing and Discussing Current Language-Model-Related Approaches
Juan Cruz-Benito
Sanjay Vishwakarma
Francisco Martín-Fernández
Ismael Faro Ibm Quantum
66
31
0
16 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
236
1,136
0
14 Sep 2020
Cluster-Former: Clustering-based Sparse Transformer for Long-Range Dependency Encoding
Shuohang Wang
Luowei Zhou
Zhe Gan
Yen-Chun Chen
Yuwei Fang
S. Sun
Yu Cheng
Jingjing Liu
93
29
0
13 Sep 2020
HiPPO: Recurrent Memory with Optimal Polynomial Projections
Albert Gu
Tri Dao
Stefano Ermon
Atri Rudra
Christopher Ré
163
548
0
17 Aug 2020
Neural Language Generation: Formulation, Methods, and Evaluation
Cristina Garbacea
Qiaozhu Mei
158
30
0
31 Jul 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
616
2,109
0
28 Jul 2020
Do Transformers Need Deep Long-Range Memory
Jack W. Rae
Ali Razavi
RALM
78
41
0
07 Jul 2020
Memory Transformer
Andrey Kravchenko
Yuri Kuratov
Anton Peganov
Grigory V. Sapunov
RALM
78
72
0
20 Jun 2020
Masked Language Modeling for Proteins via Linearly Scalable Long-Context Transformers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Peter Hawkins
Jared Davis
David Belanger
Lucy J. Colwell
Adrian Weller
100
86
0
05 Jun 2020
GMAT: Global Memory Augmentation for Transformers
Ankit Gupta
Jonathan Berant
RALM
81
50
0
05 Jun 2020
Exploring Transformers for Large-Scale Speech Recognition
Liang Lu
Changliang Liu
Jinyu Li
Jiawei Liu
67
41
0
19 May 2020
Multi-scale Transformer Language Models
Sandeep Subramanian
R. Collobert
MarcÁurelio Ranzato
Y-Lan Boureau
58
13
0
01 May 2020
Recipes for building an open-domain chatbot
Stephen Roller
Emily Dinan
Naman Goyal
Da Ju
Mary Williamson
...
Myle Ott
Kurt Shuster
Eric Michael Smith
Y-Lan Boureau
Jason Weston
ALM
168
1,020
0
28 Apr 2020
Beyond 512 Tokens: Siamese Multi-depth Transformer-based Hierarchical Encoder for Long-Form Document Matching
Liu Yang
Mingyang Zhang
Cheng Li
Michael Bendersky
Marc Najork
96
89
0
26 Apr 2020
ETC: Encoding Long and Structured Inputs in Transformers
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
88
55
0
17 Apr 2020
Training with Quantization Noise for Extreme Model Compression
Angela Fan
Pierre Stock
Benjamin Graham
Edouard Grave
Remi Gribonval
Hervé Jégou
Armand Joulin
MQ
113
246
0
15 Apr 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
232
4,110
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
413
607
0
12 Mar 2020
ProGen: Language Modeling for Protein Generation
Ali Madani
Bryan McCann
Nikhil Naik
N. Keskar
N. Anand
Raphael R. Eguchi
Po-Ssu Huang
R. Socher
96
288
0
08 Mar 2020
Sparse Sinkhorn Attention
Yi Tay
Dara Bahri
Liu Yang
Donald Metzler
Da-Cheng Juan
102
342
0
26 Feb 2020
Time-aware Large Kernel Convolutions
Vasileios Lioutas
Yuhong Guo
AI4TS
97
29
0
08 Feb 2020
Improving Transformer Models by Reordering their Sublayers
Ofir Press
Noah A. Smith
Omer Levy
87
88
0
10 Nov 2019
Natural Language Processing: State of The Art, Current Trends and Challenges
Diksha Khurana
Aditya Koli
Kiran Khatter
Sukhdev Singh
65
1,085
0
17 Aug 2017
Previous
1
2
3
4
5