Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2110.07732
Cited By
The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization
14 October 2021
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
AI4CE
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization"
47 / 47 papers shown
Title
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP)
William Bruns
43
0
0
21 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention
Mattia Opper
Roland Fernandez
P. Smolensky
Jianfeng Gao
46
0
0
29 Mar 2025
The Role of Sparsity for Length Generalization in Transformers
Noah Golowich
Samy Jelassi
David Brandfonbrener
Sham Kakade
Eran Malach
39
0
0
24 Feb 2025
Int2Int: a framework for mathematics with transformers
François Charton
ViT
43
0
0
22 Feb 2025
Stick-breaking Attention
Shawn Tan
Yikang Shen
Songlin Yang
Aaron C. Courville
Rameswar Panda
30
4
0
23 Oct 2024
Neural networks that overcome classic challenges through practice
Kazuki Irie
Brenden M. Lake
34
4
0
14 Oct 2024
On the Design Space Between Transformers and Recursive Neural Nets
Jishnu Ray Chowdhury
Cornelia Caragea
32
0
0
03 Sep 2024
Block-Operations: Using Modular Routing to Improve Compositional Generalization
Florian Dietz
Dietrich Klakow
AI4CE
19
0
0
01 Aug 2024
Transformer Normalisation Layers and the Independence of Semantic Subspaces
S. Menary
Samuel Kaski
Andre Freitas
44
2
0
25 Jun 2024
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ReLM
LRM
47
3
0
05 Jun 2024
Attention-based Iterative Decomposition for Tensor Product Representation
Taewon Park
Inchul Choi
Minho Lee
28
1
0
03 Jun 2024
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
45
5
0
25 May 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing
Zhongwang Zhang
Pengxiao Lin
Zhiwei Wang
Yaoyu Zhang
Z. Xu
39
3
0
08 May 2024
A Neural Rewriting System to Solve Algorithmic Problems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
NAI
39
0
0
27 Feb 2024
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
ELM
40
7
0
27 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt
Jishnu Ray Chowdhury
Cornelia Caragea
39
1
0
01 Feb 2024
Carrying over algorithm in transformers
J. Kruthoff
24
0
0
15 Jan 2024
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention
Róbert Csordás
Piotr Piekos
Kazuki Irie
Jürgen Schmidhuber
MoE
28
14
0
13 Dec 2023
Positional Description Matters for Transformers Arithmetic
Ruoqi Shen
Sébastien Bubeck
Ronen Eldan
Yin Tat Lee
Yuanzhi Li
Yi Zhang
34
37
0
22 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
33
6
0
21 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA
Ian Berlot-Attwell
Kumar Krishna Agrawal
A. M. Carrell
Yash Sharma
Naomi Saphra
29
1
0
15 Nov 2023
From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers
Shaoxiong Duan
Yining Shi
Wei Xu
28
8
0
18 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
MoE
22
18
0
16 Oct 2023
When can transformers reason with abstract symbols?
Enric Boix-Adserà
Omid Saremi
Emmanuel Abbe
Samy Bengio
Etai Littwin
Josh Susskind
LRM
NAI
31
12
0
15 Oct 2023
Efficient Beam Tree Recursion
Jishnu Ray Chowdhury
Cornelia Caragea
29
3
0
20 Jul 2023
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems
Flavio Petruzzellis
Alberto Testolin
A. Sperduti
AIMat
LRM
37
1
0
29 Jun 2023
SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets
Cathy Li
Emily Wenger
Zeyuan Allen-Zhu
François Charton
Kristin E. Lauter
AAML
25
10
0
20 Jun 2023
ModuleFormer: Modularity Emerges from Mixture-of-Experts
Yikang Shen
Zheyu Zhang
Tianyou Cao
Shawn Tan
Zhenfang Chen
Chuang Gan
KELM
MoE
25
6
0
07 Jun 2023
Monotonic Location Attention for Length Generalization
Jishnu Ray Chowdhury
Cornelia Caragea
LLMAG
19
8
0
31 May 2023
Beam Tree Recursive Cells
Jishnu Ray Chowdhury
Cornelia Caragea
28
6
0
31 May 2023
Randomized Positional Encodings Boost Length Generalization of Transformers
Anian Ruoss
Grégoire Delétang
Tim Genewein
Jordi Grau-Moya
Róbert Csordás
Mehdi Abbana Bennani
Shane Legg
J. Veness
LLMAG
36
99
0
26 May 2023
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation
Ta-Chung Chi
Ting-Han Fan
Alexander I. Rudnicky
Peter J. Ramadge
LRM
14
12
0
05 May 2023
Approximating CKY with Transformers
Ghazal Khalighinejad
Ollie Liu
Sam Wiseman
52
2
0
03 May 2023
SALSA PICANTE: a machine learning attack on LWE with binary secrets
Cathy Li
Jana Sotáková
Emily Wenger
Mohamed Malhou
Evrard Garcelon
François Charton
Kristin E. Lauter
AAML
30
14
0
07 Mar 2023
The Construction of Reality in an AI: A Review
J. W. Johnston
3DV
13
1
0
03 Feb 2023
Annotated History of Modern AI and Deep Learning
Juergen Schmidhuber
MLAU
AI4TS
AI4CE
30
22
0
21 Dec 2022
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
NAI
19
11
0
12 Oct 2022
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks
Yuxuan Li
James L. McClelland
39
17
0
02 Oct 2022
A Generalist Neural Algorithmic Learner
Borja Ibarz
Vitaly Kurin
George Papamakarios
Kyriacos Nikiforou
Mehdi Abbana Bennani
...
Andreea Deac
Beatrice Bevilacqua
Yaroslav Ganin
Charles Blundell
Petar Velivcković
OOD
29
53
0
22 Sep 2022
SALSA: Attacking Lattice Cryptography with Transformers
Emily Wenger
Mingjie Chen
Franccois Charton
Kristin E. Lauter
AAML
28
35
0
11 Jul 2022
Unveiling Transformers with LEGO: a synthetic reasoning task
Yi Zhang
A. Backurs
Sébastien Bubeck
Ronen Eldan
Suriya Gunasekar
Tal Wagner
LRM
28
85
0
09 Jun 2022
Block-Recurrent Transformers
DeLesley S. Hutchins
Imanol Schlag
Yuhuai Wu
Ethan Dyer
Behnam Neyshabur
20
94
0
11 Mar 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
14
26
0
11 Feb 2022
Linear algebra with transformers
Franccois Charton
AIMat
29
56
0
03 Dec 2021
PonderNet: Learning to Ponder
Andrea Banino
Jan Balaguer
Charles Blundell
PINN
AIMat
96
80
0
12 Jul 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers
Kazuki Irie
Imanol Schlag
Róbert Csordás
Jürgen Schmidhuber
28
57
0
11 Jun 2021
On the Binding Problem in Artificial Neural Networks
Klaus Greff
Sjoerd van Steenkiste
Jürgen Schmidhuber
OCL
224
254
0
09 Dec 2020
1