ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2110.07732
  4. Cited By
The Neural Data Router: Adaptive Control Flow in Transformers Improves
  Systematic Generalization

The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

14 October 2021
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
    AI4CE
ArXivPDFHTML

Papers citing "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization"

47 / 47 papers shown
Title
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
38
0
0
21 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
46
0
0
29 Mar 2025
The Role of Sparsity for Length Generalization in Transformers
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
37
0
0
24 Feb 2025
Int2Int: a framework for mathematics with transformers
Int2Int: a framework for mathematics with transformers
François Charton
ViT
41
0
0
22 Feb 2025
Stick-breaking Attention
Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Aaron C. Courville
Rameswar Panda
30
4
0
23 Oct 2024
Neural networks that overcome classic challenges through practice
Neural networks that overcome classic challenges through practice
Kazuki Irie
Brenden M. Lake
34
4
0
14 Oct 2024
On the Design Space Between Transformers and Recursive Neural Nets
On the Design Space Between Transformers and Recursive Neural Nets
Jishnu Ray Chowdhury
Cornelia Caragea
32
0
0
03 Sep 2024
Block-Operations: Using Modular Routing to Improve Compositional
  Generalization
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
19
0
0
01 Aug 2024
Transformer Normalisation Layers and the Independence of Semantic
  Subspaces
Transformer Normalisation Layers and the Independence of Semantic Subspaces
S. Menary
Samuel Kaski
Andre Freitas
44
2
0
25 Jun 2024
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large
  Language Models
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ReLM
LRM
45
3
0
05 Jun 2024
Attention-based Iterative Decomposition for Tensor Product
  Representation
Attention-based Iterative Decomposition for Tensor Product Representation
Taewon Park
Inchul Choi
Minho Lee
26
1
0
03 Jun 2024
MoEUT: Mixture-of-Experts Universal Transformers
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
45
5
0
25 May 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Zhongwang Zhang
Pengxiao Lin
Zhiwei Wang
Yaoyu Zhang
Z. Xu
39
3
0
08 May 2024
A Neural Rewriting System to Solve Algorithmic Problems
A Neural Rewriting System to Solve Algorithmic Problems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
NAI
39
0
0
27 Feb 2024
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of
  Prompting Strategies
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ELM
38
7
0
27 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Carrying over algorithm in transformers
Carrying over algorithm in transformers
J. Kruthoff
24
0
0
15 Jan 2024
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
28
14
0
13 Dec 2023
Positional Description Matters for Transformers Arithmetic
Positional Description Matters for Transformers Arithmetic
Ruoqi Shen
Sébastien Bubeck
Ronen Eldan
Yin Tat Lee
Yuanzhi Li
Yi Zhang
29
37
0
22 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on
  Synthetic, Interpretable Tasks
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
33
6
0
21 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA
Attribute Diversity Determines the Systematicity Gap in VQA
Ian Berlot-Attwell
Kumar Krishna Agrawal
A. M. Carrell
Yash Sharma
Naomi Saphra
29
1
0
15 Nov 2023
From Interpolation to Extrapolation: Complete Length Generalization for
  Arithmetic Transformers
From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers
Shaoxiong Duan
Yining Shi
Wei Xu
28
8
0
18 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
22
18
0
16 Oct 2023
When can transformers reason with abstract symbols?
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRM
NAI
31
12
0
15 Oct 2023
Efficient Beam Tree Recursion
Efficient Beam Tree Recursion
Jishnu Ray Chowdhury
Cornelia Caragea
29
3
0
20 Jul 2023
A Hybrid System for Systematic Generalization in Simple Arithmetic
  Problems
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
AIMat
LRM
37
1
0
29 Jun 2023
SALSA VERDE: a machine learning attack on Learning With Errors with
  sparse small secrets
SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets
Cathy Li
Emily Wenger
Zeyuan Allen-Zhu
François Charton
Kristin E. Lauter
AAML
25
10
0
20 Jun 2023
ModuleFormer: Modularity Emerges from Mixture-of-Experts
ModuleFormer: Modularity Emerges from Mixture-of-Experts
Yikang Shen
Zheyu Zhang
Tianyou Cao
Shawn Tan
Zhenfang Chen
Chuang Gan
KELM
MoE
25
6
0
07 Jun 2023
Monotonic Location Attention for Length Generalization
Monotonic Location Attention for Length Generalization
Jishnu Ray Chowdhury
Cornelia Caragea
LLMAG
19
8
0
31 May 2023
Beam Tree Recursive Cells
Beam Tree Recursive Cells
Jishnu Ray Chowdhury
Cornelia Caragea
28
6
0
31 May 2023
Randomized Positional Encodings Boost Length Generalization of
  Transformers
Randomized Positional Encodings Boost Length Generalization of Transformers
Anian Ruoss
Grégoire Delétang
Tim Genewein
Jordi Grau-Moya
Róbert Csordás
Mehdi Abbana Bennani
Shane Legg
J. Veness
LLMAG
36
99
0
26 May 2023
Transformer Working Memory Enables Regular Language Reasoning and
  Natural Language Length Extrapolation
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
14
12
0
05 May 2023
Approximating CKY with Transformers
Approximating CKY with Transformers
Ghazal Khalighinejad
Ollie Liu
Sam Wiseman
52
2
0
03 May 2023
SALSA PICANTE: a machine learning attack on LWE with binary secrets
SALSA PICANTE: a machine learning attack on LWE with binary secrets
Cathy Li
Jana Sotáková
Emily Wenger
Mohamed Malhou
Evrard Garcelon
François Charton
Kristin E. Lauter
AAML
30
14
0
07 Mar 2023
The Construction of Reality in an AI: A Review
The Construction of Reality in an AI: A Review
J. W. Johnston
3DV
13
1
0
03 Feb 2023
Annotated History of Modern AI and Deep Learning
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
28
22
0
21 Dec 2022
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of
  Known Functions, and Compatibility of Neural Representations
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
NAI
19
11
0
12 Oct 2022
Systematic Generalization and Emergent Structures in Transformers
  Trained on Structured Tasks
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
39
17
0
02 Oct 2022
A Generalist Neural Algorithmic Learner
A Generalist Neural Algorithmic Learner
Borja Ibarz
Vitaly Kurin
George Papamakarios
Kyriacos Nikiforou
Mehdi Abbana Bennani
...
Andreea Deac
Beatrice Bevilacqua
Yaroslav Ganin
Charles Blundell
Petar Velivcković
OOD
26
53
0
22 Sep 2022
SALSA: Attacking Lattice Cryptography with Transformers
SALSA: Attacking Lattice Cryptography with Transformers
Emily Wenger
Mingjie Chen
Franccois Charton
Kristin E. Lauter
AAML
26
35
0
11 Jul 2022
Unveiling Transformers with LEGO: a synthetic reasoning task
Unveiling Transformers with LEGO: a synthetic reasoning task
Yi Zhang
A. Backurs
Sébastien Bubeck
Ronen Eldan
Suriya Gunasekar
Tal Wagner
LRM
28
85
0
09 Jun 2022
Block-Recurrent Transformers
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
18
94
0
11 Mar 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
14
26
0
11 Feb 2022
Linear algebra with transformers
Linear algebra with transformers
Franccois Charton
AIMat
29
56
0
03 Dec 2021
PonderNet: Learning to Ponder
PonderNet: Learning to Ponder
Andrea Banino
Jan Balaguer
Charles Blundell
PINN
AIMat
96
80
0
12 Jul 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
26
57
0
11 Jun 2021
On the Binding Problem in Artificial Neural Networks
On the Binding Problem in Artificial Neural Networks
Klaus Greff
Sjoerd van Steenkiste
Jürgen Schmidhuber
OCL
224
254
0
09 Dec 2020
1