ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.00976
  4. Cited By
Investigating Recurrent Transformers with Dynamic Halt

Investigating Recurrent Transformers with Dynamic Halt

1 February 2024
Jishnu Ray Chowdhury
Cornelia Caragea
ArXivPDFHTML

Papers citing "Investigating Recurrent Transformers with Dynamic Halt"

36 / 86 papers shown
Title
The Neural Data Router: Adaptive Control Flow in Transformers Improves
  Systematic Generalization
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
72
57
0
14 Oct 2021
Dynamic Inference with Neural Interpreters
Dynamic Inference with Neural Interpreters
Nasim Rahaman
Muhammad Waleed Gondal
S. Joshi
Peter V. Gehler
Yoshua Bengio
Francesco Locatello
Bernhard Schölkopf
82
31
0
12 Oct 2021
Saturated Transformers are Constant-Depth Threshold Circuits
Saturated Transformers are Constant-Depth Threshold Circuits
William Merrill
Ashish Sabharwal
Noah A. Smith
59
104
0
30 Jun 2021
Modeling Hierarchical Structures with Continuous Recursive Neural
  Networks
Modeling Hierarchical Structures with Continuous Recursive Neural Networks
Jishnu Ray Chowdhury
Cornelia Caragea
49
15
0
10 Jun 2021
Staircase Attention for Recurrent Processing of Sequences
Staircase Attention for Recurrent Processing of Sequences
Da Ju
Stephen Roller
Sainbayar Sukhbaatar
Jason Weston
56
11
0
08 Jun 2021
Consistent Accelerated Inference via Confident Adaptive Transformers
Consistent Accelerated Inference via Confident Adaptive Transformers
Tal Schuster
Adam Fisch
Tommi Jaakkola
Regina Barzilay
AI4TS
225
72
0
18 Apr 2021
Transformer in Transformer
Transformer in Transformer
Kai Han
An Xiao
Enhua Wu
Jianyuan Guo
Chunjing Xu
Yunhe Wang
ViT
375
1,556
0
27 Feb 2021
Dynamic Neural Networks: A Survey
Dynamic Neural Networks: A Survey
Yizeng Han
Gao Huang
Shiji Song
Le Yang
Honghui Wang
Yulin Wang
3DH
AI4TS
AI4CE
87
645
0
09 Feb 2021
Long Range Arena: A Benchmark for Efficient Transformers
Long Range Arena: A Benchmark for Efficient Transformers
Yi Tay
Mostafa Dehghani
Samira Abnar
Songlin Yang
Dara Bahri
Philip Pham
J. Rao
Liu Yang
Sebastian Ruder
Donald Metzler
132
717
0
08 Nov 2020
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Memformer: A Memory-Augmented Transformer for Sequence Modeling
Qingyang Wu
Zhenzhong Lan
Kun Qian
Jing Gu
A. Geramifard
Zhou Yu
44
49
0
14 Oct 2020
Neurocoder: Learning General-Purpose Computation Using Stored Neural
  Programs
Neurocoder: Learning General-Purpose Computation Using Stored Neural Programs
Hung Le
Svetha Venkatesh
NAI
34
5
0
24 Sep 2020
Transformers are RNNs: Fast Autoregressive Transformers with Linear
  Attention
Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention
Angelos Katharopoulos
Apoorv Vyas
Nikolaos Pappas
Franccois Fleuret
166
1,755
0
29 Jun 2020
BERT Loses Patience: Fast and Robust Inference with Early Exit
BERT Loses Patience: Fast and Robust Inference with Early Exit
Wangchunshu Zhou
Canwen Xu
Tao Ge
Julian McAuley
Ke Xu
Furu Wei
45
341
0
07 Jun 2020
MART: Memory-Augmented Recurrent Transformer for Coherent Video
  Paragraph Captioning
MART: Memory-Augmented Recurrent Transformer for Coherent Video Paragraph Captioning
Jie Lei
Liwei Wang
Yelong Shen
Dong Yu
Tamara L. Berg
Joey Tianyi Zhou
47
190
0
11 May 2020
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Highway Transformer: Self-Gating Enhanced Self-Attentive Networks
Yekun Chai
Jin Shuo
Xinwen Hou
38
17
0
17 Apr 2020
DynaBERT: Dynamic BERT with Adaptive Width and Depth
DynaBERT: Dynamic BERT with Adaptive Width and Depth
Lu Hou
Zhiqi Huang
Lifeng Shang
Xin Jiang
Xiao Chen
Qun Liu
MQ
73
322
0
08 Apr 2020
GLU Variants Improve Transformer
GLU Variants Improve Transformer
Noam M. Shazeer
118
989
0
12 Feb 2020
Compressive Transformers for Long-Range Sequence Modelling
Compressive Transformers for Long-Range Sequence Modelling
Jack W. Rae
Anna Potapenko
Siddhant M. Jayakumar
Timothy Lillicrap
RALM
VLM
KELM
62
646
0
13 Nov 2019
Ordered Memory
Ordered Memory
Songlin Yang
Shawn Tan
Seyedarian Hosseini
Zhouhan Lin
Alessandro Sordoni
Aaron Courville
44
23
0
29 Oct 2019
Depth-Adaptive Transformer
Depth-Adaptive Transformer
Maha Elbayad
Jiatao Gu
Edouard Grave
Michael Auli
83
190
0
22 Oct 2019
ALBERT: A Lite BERT for Self-supervised Learning of Language
  Representations
ALBERT: A Lite BERT for Self-supervised Learning of Language Representations
Zhenzhong Lan
Mingda Chen
Sebastian Goodman
Kevin Gimpel
Piyush Sharma
Radu Soricut
SSL
AIMat
330
6,441
0
26 Sep 2019
Deep Equilibrium Models
Deep Equilibrium Models
Shaojie Bai
J. Zico Kolter
V. Koltun
78
665
0
03 Sep 2019
Cooperative Learning of Disjoint Syntax and Semantics
Cooperative Learning of Disjoint Syntax and Semantics
Serhii Havrylov
Germán Kruszewski
Armand Joulin
45
48
0
25 Feb 2019
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context
Zihang Dai
Zhilin Yang
Yiming Yang
J. Carbonell
Quoc V. Le
Ruslan Salakhutdinov
VLM
198
3,724
0
09 Jan 2019
Universal Transformers
Universal Transformers
Mostafa Dehghani
Stephan Gouws
Oriol Vinyals
Jakob Uszkoreit
Lukasz Kaiser
80
752
0
10 Jul 2018
ListOps: A Diagnostic Dataset for Latent Tree Learning
ListOps: A Diagnostic Dataset for Latent Tree Learning
Nikita Nangia
Samuel R. Bowman
45
137
0
17 Apr 2018
The Importance of Being Recurrent for Modeling Hierarchical Structure
The Importance of Being Recurrent for Modeling Hierarchical Structure
Ke M. Tran
Arianna Bisazza
Christof Monz
63
150
0
09 Mar 2018
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Parallelizing Linear Recurrent Neural Nets Over Sequence Length
Eric Martin
Chris Cundy
49
99
0
12 Sep 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
640
130,942
0
12 Jun 2017
Language Modeling with Gated Convolutional Networks
Language Modeling with Gated Convolutional Networks
Yann N. Dauphin
Angela Fan
Michael Auli
David Grangier
212
2,391
0
23 Dec 2016
Layer Normalization
Layer Normalization
Jimmy Lei Ba
J. Kiros
Geoffrey E. Hinton
344
10,467
0
21 Jul 2016
Adaptive Computation Time for Recurrent Neural Networks
Adaptive Computation Time for Recurrent Neural Networks
Alex Graves
90
546
0
29 Mar 2016
Neural GPUs Learn Algorithms
Neural GPUs Learn Algorithms
Lukasz Kaiser
Ilya Sutskever
77
369
0
25 Nov 2015
Tree-structured composition in neural networks without tree-structured
  architectures
Tree-structured composition in neural networks without tree-structured architectures
Samuel R. Bowman
Christopher D. Manning
Christopher Potts
63
75
0
16 Jun 2015
Neural Turing Machines
Neural Turing Machines
Alex Graves
Greg Wayne
Ivo Danihelka
95
2,325
0
20 Oct 2014
Self-Delimiting Neural Networks
Self-Delimiting Neural Networks
Jürgen Schmidhuber
76
37
0
29 Sep 2012
Previous
12