Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2004.08483
Cited By
ETC: Encoding Long and Structured Inputs in Transformers
17 April 2020
Joshua Ainslie
Santiago Ontanon
Chris Alberti
Vaclav Cvicek
Zachary Kenneth Fisher
Philip Pham
Anirudh Ravula
Sumit Sanghai
Qifan Wang
Li Yang
Re-assign community
ArXiv
PDF
HTML
Papers citing
"ETC: Encoding Long and Structured Inputs in Transformers"
29 / 29 papers shown
Title
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
93
3,996
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
285
588
0
12 Mar 2020
Reformer: The Efficient Transformer
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
159
2,279
0
13 Jan 2020
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
53
636
0
13 Nov 2019
BP-Transformer: Modelling Long-Range Context via Binary Partitioning
Zihao Ye
Qipeng Guo
Quan Gan
Xipeng Qiu
Zheng Zhang
49
77
0
11 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
265
19,824
0
23 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
266
6,420
0
26 Sep 2019
RoBERTa: A Robustly Optimized BERT Pretraining Approach
Yinhan Liu
Myle Ott
Naman Goyal
Jingfei Du
Mandar Joshi
Danqi Chen
Omer Levy
M. Lewis
Luke Zettlemoyer
Veselin Stoyanov
AIMat
406
24,160
0
26 Jul 2019
Generating Logical Forms from Graph Representations of Text and Entities
Peter Shaw
Philip Massey
Angelica Chen
Francesco Piccinno
Yasemin Altun
GNN
AI4CE
NAI
51
38
0
21 May 2019
Adaptive Attention Span in Transformers
Sainbayar Sukhbaatar
Edouard Grave
Piotr Bojanowski
Armand Joulin
60
285
0
19 May 2019
HIBERT: Document Level Pre-training of Hierarchical Bidirectional Transformers for Document Summarization
Xingxing Zhang
Furu Wei
M. Zhou
62
377
0
16 May 2019
Generating Long Sequences with Sparse Transformers
R. Child
Scott Gray
Alec Radford
Ilya Sutskever
70
1,880
0
23 Apr 2019
Large Batch Optimization for Deep Learning: Training BERT in 76 minutes
Yang You
Jing Li
Sashank J. Reddi
Jonathan Hseu
Sanjiv Kumar
Srinadh Bhojanapalli
Xiaodan Song
J. Demmel
Kurt Keutzer
Cho-Jui Hsieh
ODL
163
991
0
01 Apr 2019
Star-Transformer
Qipeng Guo
Xipeng Qiu
Pengfei Liu
Yunfan Shao
Xiangyang Xue
Zheng Zhang
58
264
0
25 Feb 2019
A BERT Baseline for the Natural Questions
Chris Alberti
Kenton Lee
Michael Collins
ELM
AI4MH
41
127
0
24 Jan 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
138
3,714
0
09 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
951
93,936
0
11 Oct 2018
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
115
2,577
0
25 Sep 2018
Document-Level Neural Machine Translation with Hierarchical Attention Networks
Lesly Miculicich
Dhananjay Ram
Nikolaos Pappas
James Henderson
AIMat
46
268
0
05 Sep 2018
Representation Learning with Contrastive Predictive Coding
Aaron van den Oord
Yazhe Li
Oriol Vinyals
DRL
SSL
227
10,152
0
10 Jul 2018
GLUE: A Multi-Task Benchmark and Analysis Platform for Natural Language Understanding
Alex Jinpeng Wang
Amanpreet Singh
Julian Michael
Felix Hill
Omer Levy
Samuel R. Bowman
ELM
652
7,080
0
20 Apr 2018
Bi-Directional Block Self-Attention for Fast and Memory-Efficient Sequence Modeling
Tao Shen
Dinesh Manocha
Guodong Long
Jing Jiang
Chengqi Zhang
HAI
64
148
0
03 Apr 2018
Self-Attention with Relative Position Representations
Peter Shaw
Jakob Uszkoreit
Ashish Vaswani
117
2,269
0
06 Mar 2018
Generating Wikipedia by Summarizing Long Sequences
Peter J. Liu
Mohammad Saleh
Etienne Pot
Ben Goodrich
Ryan Sepassi
Lukasz Kaiser
Noam M. Shazeer
CVBM
123
786
0
30 Jan 2018
Graph Attention Networks
Petar Velickovic
Guillem Cucurull
Arantxa Casanova
Adriana Romero
Pietro Lio
Yoshua Bengio
GNN
314
19,991
0
30 Oct 2017
The Reversible Residual Network: Backpropagation Without Storing Activations
Aidan Gomez
Mengye Ren
R. Urtasun
Roger C. Grosse
56
546
0
14 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
443
129,831
0
12 Jun 2017
RACE: Large-scale ReAding Comprehension Dataset From Examinations
Guokun Lai
Qizhe Xie
Hanxiao Liu
Yiming Yang
Eduard H. Hovy
ELM
144
1,329
0
15 Apr 2017
SQuAD: 100,000+ Questions for Machine Comprehension of Text
Pranav Rajpurkar
Jian Zhang
Konstantin Lopyrev
Percy Liang
RALM
151
8,067
0
16 Jun 2016
1