The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization

14 October 2021

Papers citing "The Neural Data Router: Adaptive Control Flow in Transformers Improves Systematic Generalization"

47 / 47 papers shown

Title
Exploring Compositional Generalization (in ReCOGS_pos) by Transformers using Restricted Access Sequence Processing (RASP) William Bruns 38 0 0 21 Apr 2025
TRA: Better Length Generalisation with Threshold Relative Attention Mattia Opper Roland Fernandez P. Smolensky Jianfeng Gao 46 0 0 29 Mar 2025
The Role of Sparsity for Length Generalization in Transformers Noah Golowich Samy Jelassi David Brandfonbrener Sham Kakade Eran Malach 37 0 0 24 Feb 2025
Int2Int: a framework for mathematics with transformers François Charton ViT 41 0 0 22 Feb 2025
Stick-breaking Attention Shawn Tan Yikang Shen Songlin Yang Aaron C. Courville Rameswar Panda 30 4 0 23 Oct 2024
Neural networks that overcome classic challenges through practice Kazuki Irie Brenden M. Lake 34 4 0 14 Oct 2024
On the Design Space Between Transformers and Recursive Neural Nets Jishnu Ray Chowdhury Cornelia Caragea 32 0 0 03 Sep 2024
Block-Operations: Using Modular Routing to Improve Compositional Generalization Florian Dietz Dietrich Klakow AI4CE 19 0 0 01 Aug 2024
Transformer Normalisation Layers and the Independence of Semantic Subspaces S. Menary Samuel Kaski Andre Freitas 44 2 0 25 Jun 2024
Assessing the Emergent Symbolic Reasoning Abilities of Llama Large Language Models Flavio Petruzzellis Alberto Testolin A. Sperduti ReLM LRM 45 3 0 05 Jun 2024
Attention-based Iterative Decomposition for Tensor Product Representation Taewon Park Inchul Choi Minho Lee 26 1 0 03 Jun 2024
MoEUT: Mixture-of-Experts Universal Transformers Róbert Csordás Kazuki Irie Jürgen Schmidhuber Christopher Potts Christopher D. Manning MoE 45 5 0 25 May 2024
Initialization is Critical to Whether Transformers Fit Composite Functions by Reasoning or Memorizing Zhongwang Zhang Pengxiao Lin Zhiwei Wang Yaoyu Zhang Z. Xu 39 3 0 08 May 2024
A Neural Rewriting System to Solve Algorithmic Problems Flavio Petruzzellis Alberto Testolin A. Sperduti NAI 39 0 0 27 Feb 2024
Benchmarking GPT-4 on Algorithmic Problems: A Systematic Evaluation of Prompting Strategies Flavio Petruzzellis Alberto Testolin A. Sperduti ELM 38 7 0 27 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 39 1 0 01 Feb 2024
Carrying over algorithm in transformers J. Kruthoff 24 0 0 15 Jan 2024
SwitchHead: Accelerating Transformers with Mixture-of-Experts Attention Róbert Csordás Piotr Piekos Kazuki Irie Jürgen Schmidhuber MoE 28 14 0 13 Dec 2023
Positional Description Matters for Transformers Arithmetic Ruoqi Shen Sébastien Bubeck Ronen Eldan Yin Tat Lee Yuanzhi Li Yi Zhang 29 37 0 22 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks Rahul Ramesh Ekdeep Singh Lubana Mikail Khona Robert P. Dick Hidenori Tanaka CoGe 33 6 0 21 Nov 2023
Attribute Diversity Determines the Systematicity Gap in VQA Ian Berlot-Attwell Kumar Krishna Agrawal A. M. Carrell Yash Sharma Naomi Saphra 29 1 0 15 Nov 2023
From Interpolation to Extrapolation: Complete Length Generalization for Arithmetic Transformers Shaoxiong Duan Yining Shi Wei Xu 28 8 0 18 Oct 2023
Approximating Two-Layer Feedforward Networks for Efficient Transformers Róbert Csordás Kazuki Irie Jürgen Schmidhuber MoE 22 18 0 16 Oct 2023
When can transformers reason with abstract symbols? Enric Boix-Adserà Omid Saremi Emmanuel Abbe Samy Bengio Etai Littwin Josh Susskind LRM NAI 31 12 0 15 Oct 2023
Efficient Beam Tree Recursion Jishnu Ray Chowdhury Cornelia Caragea 29 3 0 20 Jul 2023
A Hybrid System for Systematic Generalization in Simple Arithmetic Problems Flavio Petruzzellis Alberto Testolin A. Sperduti AIMat LRM 37 1 0 29 Jun 2023
SALSA VERDE: a machine learning attack on Learning With Errors with sparse small secrets Cathy Li Emily Wenger Zeyuan Allen-Zhu François Charton Kristin E. Lauter AAML 25 10 0 20 Jun 2023
ModuleFormer: Modularity Emerges from Mixture-of-Experts Yikang Shen Zheyu Zhang Tianyou Cao Shawn Tan Zhenfang Chen Chuang Gan KELM MoE 25 6 0 07 Jun 2023
Monotonic Location Attention for Length Generalization Jishnu Ray Chowdhury Cornelia Caragea LLMAG 19 8 0 31 May 2023
Beam Tree Recursive Cells Jishnu Ray Chowdhury Cornelia Caragea 28 6 0 31 May 2023
Randomized Positional Encodings Boost Length Generalization of Transformers Anian Ruoss Grégoire Delétang Tim Genewein Jordi Grau-Moya Róbert Csordás Mehdi Abbana Bennani Shane Legg J. Veness LLMAG 36 99 0 26 May 2023
Transformer Working Memory Enables Regular Language Reasoning and Natural Language Length Extrapolation Ta-Chung Chi Ting-Han Fan Alexander I. Rudnicky Peter J. Ramadge LRM 14 12 0 05 May 2023
Approximating CKY with Transformers Ghazal Khalighinejad Ollie Liu Sam Wiseman 52 2 0 03 May 2023
SALSA PICANTE: a machine learning attack on LWE with binary secrets Cathy Li Jana Sotáková Emily Wenger Mohamed Malhou Evrard Garcelon François Charton Kristin E. Lauter AAML 30 14 0 07 Mar 2023
The Construction of Reality in an AI: A Review J. W. Johnston 3DV 13 1 0 03 Feb 2023
Annotated History of Modern AI and Deep Learning Juergen Schmidhuber MLAU AI4TS AI4CE 28 22 0 21 Dec 2022
CTL++: Evaluating Generalization on Never-Seen Compositional Patterns of Known Functions, and Compatibility of Neural Representations Róbert Csordás Kazuki Irie Jürgen Schmidhuber NAI 19 11 0 12 Oct 2022
Systematic Generalization and Emergent Structures in Transformers Trained on Structured Tasks Yuxuan Li James L. McClelland 39 17 0 02 Oct 2022
A Generalist Neural Algorithmic Learner Borja Ibarz Vitaly Kurin George Papamakarios Kyriacos Nikiforou Mehdi Abbana Bennani ... Andreea Deac Beatrice Bevilacqua Yaroslav Ganin Charles Blundell Petar Velivcković OOD 26 53 0 22 Sep 2022
SALSA: Attacking Lattice Cryptography with Transformers Emily Wenger Mingjie Chen Franccois Charton Kristin E. Lauter AAML 26 35 0 11 Jul 2022
Unveiling Transformers with LEGO: a synthetic reasoning task Yi Zhang A. Backurs Sébastien Bubeck Ronen Eldan Suriya Gunasekar Tal Wagner LRM 28 85 0 09 Jun 2022
Block-Recurrent Transformers DeLesley S. Hutchins Imanol Schlag Yuhuai Wu Ethan Dyer Behnam Neyshabur 18 94 0 11 Mar 2022
A Modern Self-Referential Weight Matrix That Learns to Modify Itself Kazuki Irie Imanol Schlag Róbert Csordás Jürgen Schmidhuber 14 26 0 11 Feb 2022
Linear algebra with transformers Franccois Charton AIMat 29 56 0 03 Dec 2021
PonderNet: Learning to Ponder Andrea Banino Jan Balaguer Charles Blundell PINN AIMat 96 80 0 12 Jul 2021
Going Beyond Linear Transformers with Recurrent Fast Weight Programmers Kazuki Irie Imanol Schlag Róbert Csordás Jürgen Schmidhuber 26 57 0 11 Jun 2021
On the Binding Problem in Artificial Neural Networks Klaus Greff Sjoerd van Steenkiste Jürgen Schmidhuber OCL 224 254 0 09 Dec 2020