Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.07852
Cited By
Block-Recurrent Transformers
11 March 2022
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Block-Recurrent Transformers"
47 / 47 papers shown
Title
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
101
1
0
01 Feb 2024
Beyond the Imitation Game: Quantifying and extrapolating the capabilities of language models
Aarohi Srivastava
Abhinav Rastogi
Abhishek Rao
Abu Awal Md Shoeb
Abubakar Abid
...
Zhuoye Zhao
Zijian Wang
Zijie J. Wang
Zirui Wang
Ziyi Wu
ELM
183
1,750
0
09 Jun 2022
SQuALITY: Building a Long-Document Summarization Dataset the Hard Way
Alex Jinpeng Wang
Richard Yuanzhe Pang
Angelica Chen
Jason Phang
Samuel R. Bowman
102
45
0
23 May 2022
Training Compute-Optimal Large Language Models
Jordan Hoffmann
Sebastian Borgeaud
A. Mensch
Elena Buchatskaya
Trevor Cai
...
Karen Simonyan
Erich Elsen
Jack W. Rae
Oriol Vinyals
Laurent Sifre
AI4TS
203
1,946
0
29 Mar 2022
Memorizing Transformers
Yuhuai Wu
M. Rabe
DeLesley S. Hutchins
Christian Szegedy
RALM
82
177
0
16 Mar 2022
Transformer Quality in Linear Time
Weizhe Hua
Zihang Dai
Hanxiao Liu
Quoc V. Le
122
228
0
21 Feb 2022
General-purpose, long-context autoregressive modeling with Perceiver AR
Curtis Hawthorne
Andrew Jaegle
Cătălina Cangea
Sebastian Borgeaud
C. Nash
...
Hannah R. Sheahan
Neil Zeghidour
Jean-Baptiste Alayrac
João Carreira
Jesse Engel
69
65
0
15 Feb 2022
LaMDA: Language Models for Dialog Applications
R. Thoppilan
Daniel De Freitas
Jamie Hall
Noam M. Shazeer
Apoorv Kulshreshtha
...
Blaise Aguera-Arcas
Claire Cui
M. Croak
Ed H. Chi
Quoc Le
ALM
137
1,595
0
20 Jan 2022
SCROLLS: Standardized CompaRison Over Long Language Sequences
Uri Shaham
Elad Segal
Maor Ivgi
Avia Efrat
Ori Yoran
...
Ankit Gupta
Wenhan Xiong
Mor Geva
Jonathan Berant
Omer Levy
RALM
82
137
0
10 Jan 2022
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
79
57
0
14 Oct 2021
Do Long-Range Language Models Actually Use Long-Range Context?
Simeng Sun
Kalpesh Krishna
Andrew Mattarella-Micke
Mohit Iyyer
RALM
59
84
0
19 Sep 2021
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation
Ofir Press
Noah A. Smith
M. Lewis
324
759
0
27 Aug 2021
H-Transformer-1D: Fast One-Dimensional Hierarchical Attention for Sequences
Zhenhai Zhu
Radu Soricut
135
42
0
25 Jul 2021
Combiner: Full Attention Transformer with Sparse Computation Cost
Hongyu Ren
H. Dai
Zihang Dai
Mengjiao Yang
J. Leskovec
Dale Schuurmans
Bo Dai
129
80
0
12 Jul 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
74
62
0
11 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
68
11
0
08 Jun 2021
Not All Memories are Created Equal: Learning to Forget by Expiring
Sainbayar Sukhbaatar
Da Ju
Spencer Poff
Stephen Roller
Arthur Szlam
Jason Weston
Angela Fan
CLL
55
35
0
13 May 2021
Attention is Not All You Need: Pure Attention Loses Rank Doubly Exponentially with Depth
Yihe Dong
Jean-Baptiste Cordonnier
Andreas Loukas
126
385
0
05 Mar 2021
Perceiver: General Perception with Iterative Attention
Andrew Jaegle
Felix Gimeno
Andrew Brock
Andrew Zisserman
Oriol Vinyals
João Carreira
VLM
ViT
MDE
194
1,015
0
04 Mar 2021
Random Feature Attention
Hao Peng
Nikolaos Pappas
Dani Yogatama
Roy Schwartz
Noah A. Smith
Lingpeng Kong
104
358
0
03 Mar 2021
When Attention Meets Fast Recurrence: Training Language Models with Reduced Compute
Tao Lei
RALM
VLM
107
49
0
24 Feb 2021
Linear Transformers Are Secretly Fast Weight Programmers
Imanol Schlag
Kazuki Irie
Jürgen Schmidhuber
119
247
0
22 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
142
720
0
08 Nov 2020
Query-Key Normalization for Transformers
Alex Henry
Prudhvi Raj Dachapally
S. Pawar
Yuxuan Chen
38
85
0
08 Oct 2020
Rethinking Attention with Performers
K. Choromanski
Valerii Likhosherstov
David Dohan
Xingyou Song
Andreea Gane
...
Afroz Mohiuddin
Lukasz Kaiser
David Belanger
Lucy J. Colwell
Adrian Weller
181
1,585
0
30 Sep 2020
Efficient Transformers: A Survey
Yi Tay
Mostafa Dehghani
Dara Bahri
Donald Metzler
VLM
154
1,123
0
14 Sep 2020
Big Bird: Transformers for Longer Sequences
Manzil Zaheer
Guru Guruganesh
Kumar Avinava Dubey
Joshua Ainslie
Chris Alberti
...
Philip Pham
Anirudh Ravula
Qifan Wang
Li Yang
Amr Ahmed
VLM
543
2,086
0
28 Jul 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
201
1,765
0
29 Jun 2020
Linformer: Self-Attention with Linear Complexity
Sinong Wang
Belinda Z. Li
Madian Khabsa
Han Fang
Hao Ma
213
1,706
0
08 Jun 2020
Funnel-Transformer: Filtering out Sequential Redundancy for Efficient Language Processing
Zihang Dai
Guokun Lai
Yiming Yang
Quoc V. Le
85
233
0
05 Jun 2020
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
774
42,055
0
28 May 2020
Longformer: The Long-Document Transformer
Iz Beltagy
Matthew E. Peters
Arman Cohan
RALM
VLM
171
4,071
0
10 Apr 2020
Efficient Content-Based Sparse Attention with Routing Transformers
Aurko Roy
M. Saffar
Ashish Vaswani
David Grangier
MoE
309
597
0
12 Mar 2020
Towards a Human-like Open-Domain Chatbot
Daniel De Freitas
Minh-Thang Luong
David R. So
Jamie Hall
Noah Fiedel
...
Zi Yang
Apoorv Kulshreshtha
Gaurav Nemade
Yifeng Lu
Quoc V. Le
114
938
0
27 Jan 2020
Reformer: The Efficient Transformer
Nikita Kitaev
Lukasz Kaiser
Anselm Levskaya
VLM
186
2,320
0
13 Jan 2020
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
69
648
0
13 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
427
20,181
0
23 Oct 2019
R-Transformer: Recurrent Neural Network Enhanced Transformer
Z. Wang
Yao Ma
Zitao Liu
Jiliang Tang
ViT
65
106
0
12 Jul 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
241
3,728
0
09 Jan 2019
SentencePiece: A simple and language independent subword tokenizer and detokenizer for Neural Text Processing
Taku Kudo
John Richardson
196
3,520
0
19 Aug 2018
Sharp Nearby, Fuzzy Far Away: How Neural Language Models Use Context
Urvashi Khandelwal
He He
Peng Qi
Dan Jurafsky
RALM
52
295
0
12 May 2018
The Best of Both Worlds: Combining Recent Advances in Neural Machine Translation
Mengzhao Chen
Orhan Firat
Ankur Bapna
Melvin Johnson
Wolfgang Macherey
...
Niki Parmar
M. Schuster
Zhifeng Chen
Yonghui Wu
Macduff Hughes
AIMat
58
457
0
26 Apr 2018
Adafactor: Adaptive Learning Rates with Sublinear Memory Cost
Noam M. Shazeer
Mitchell Stern
ODL
76
1,048
0
11 Apr 2018
Simple Recurrent Units for Highly Parallelizable Recurrence
Tao Lei
Yu Zhang
Sida I. Wang
Huijing Dai
Yoav Artzi
LRM
97
275
0
08 Sep 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
701
131,652
0
12 Jun 2017
Training Very Deep Networks
R. Srivastava
Klaus Greff
Jürgen Schmidhuber
161
1,682
0
22 Jul 2015
LSTM: A Search Space Odyssey
Klaus Greff
R. Srivastava
Jan Koutník
Bas R. Steunebrink
Jürgen Schmidhuber
AI4TS
VLM
118
5,302
0
13 Mar 2015
1