Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.00976
Cited By
Investigating Recurrent Transformers with Dynamic Halt
1 February 2024
Jishnu Ray Chowdhury
Cornelia Caragea
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Investigating Recurrent Transformers with Dynamic Halt"
36 / 86 papers shown
Title
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
72
57
0
14 Oct 2021
Dynamic Inference with Neural Interpreters
Nasim Rahaman
Muhammad Waleed Gondal
S. Joshi
Peter V. Gehler
Yoshua Bengio
Francesco Locatello
Bernhard Schölkopf
82
31
0
12 Oct 2021
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
59
104
0
30 Jun 2021
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
Jishnu Ray Chowdhury
Cornelia Caragea
49
15
0
10 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
56
11
0
08 Jun 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
225
72
0
18 Apr 2021
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
375
1,556
0
27 Feb 2021
Dynamic Neural Networks: A Survey
Yizeng Han
Gao Huang
Shiji Song
Le Yang
Honghui Wang
Yulin Wang
3DH
AI4TS
AI4CE
87
645
0
09 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
132
717
0
08 Nov 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
44
49
0
14 Oct 2020
Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs
Hung Le
Svetha Venkatesh
NAI
34
5
0
24 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
166
1,755
0
29 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
45
341
0
07 Jun 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
47
190
0
11 May 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Yekun Chai
Jin Shuo
Xinwen Hou
38
17
0
17 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
73
322
0
08 Apr 2020
GLU Variants Improve Transformer
Noam M. Shazeer
118
989
0
12 Feb 2020
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
62
646
0
13 Nov 2019
Ordered Memory
Songlin Yang
Shawn Tan
Seyedarian Hosseini
Zhouhan Lin
Alessandro Sordoni
Aaron Courville
44
23
0
29 Oct 2019
Depth-Adaptive Transformer
Maha Elbayad
Jiatao Gu
Edouard Grave
Michael Auli
83
190
0
22 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
330
6,441
0
26 Sep 2019
Deep Equilibrium Models
Shaojie Bai
J. Zico Kolter
V. Koltun
78
665
0
03 Sep 2019
Cooperative Learning of Disjoint Syntax and Semantics
Serhii Havrylov
Germán Kruszewski
Armand Joulin
45
48
0
25 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
198
3,724
0
09 Jan 2019
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
80
752
0
10 Jul 2018
ListOps: A Diagnostic Dataset for Latent Tree Learning
Nikita Nangia
Samuel R. Bowman
45
137
0
17 Apr 2018
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke M. Tran
Arianna Bisazza
Christof Monz
63
150
0
09 Mar 2018
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
49
99
0
12 Sep 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
640
130,942
0
12 Jun 2017
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
212
2,391
0
23 Dec 2016
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
344
10,467
0
21 Jul 2016
Adaptive Computation Time for Recurrent Neural Networks
Alex Graves
90
546
0
29 Mar 2016
Neural GPUs Learn Algorithms
Lukasz Kaiser
Ilya Sutskever
77
369
0
25 Nov 2015
Tree-structured composition in neural networks without tree-structured architectures
Samuel R. Bowman
Christopher D. Manning
Christopher Potts
63
75
0
16 Jun 2015
Neural Turing Machines
Alex Graves
Greg Wayne
Ivo Danihelka
95
2,325
0
20 Oct 2014
Self-Delimiting Neural Networks
Jürgen Schmidhuber
76
37
0
29 Sep 2012
Previous
1
2