Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions

22 November 2022

Papers citing "Simplicity Bias in Transformers and their Ability to Learn Sparse Boolean Functions"

36 / 36 papers shown

Title
Transformers Can Overcome the Curse of Dimensionality: A Theoretical Study from an Approximation Perspective Yuling Jiao Yanming Lai Yang Wang Bokai Yan 39 0 0 18 Apr 2025
Language Models, Graph Searching, and Supervision Adulteration: When More Supervision is Less and How to Make More More Arvid Frydenlund LRM 48 0 0 13 Mar 2025
Do We Always Need the Simplicity Bias? Looking for Optimal Inductive Biases in the Wild Damien Teney Liangze Jiang Florin Gogianu Ehsan Abbasnejad 169 0 0 13 Mar 2025
A distributional simplicity bias in the learning dynamics of transformers Riccardo Rende Federica Gerace A. Laio Sebastian Goldt 79 8 0 17 Feb 2025
Lower Bounds for Chain-of-Thought Reasoning in Hard-Attention Transformers Alireza Amiri Xinting Huang Mark Rofin Michael Hahn LRM 180 0 0 04 Feb 2025
Exploring Grokking: Experimental and Mechanistic Investigations Hu Qiye Zhou Hao Yu RuoXi 79 1 0 14 Dec 2024
Training Neural Networks as Recognizers of Formal Languages Alexandra Butoi Ghazal Khalighinejad Anej Svete Josef Valvoda Ryan Cotterell Brian DuSell NAI 44 2 0 11 Nov 2024
The Mystery of the Pathological Path-star Task for Language Models Arvid Frydenlund LRM 27 4 0 17 Oct 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition Mohamad Amin Mohamadi Zhiyuan Li Lei Wu Danica J. Sutherland 48 9 0 17 Jul 2024
Exploiting the equivalence between quantum neural networks and perceptrons Chris Mingard Jessica Pointing Charles London Yoonsoo Nam Ard A. Louis 35 2 0 05 Jul 2024
Early learning of the optimal constant solution in neural networks and humans Jirko Rubruck Jan P. Bauer Andrew M. Saxe Christopher Summerfield 33 1 0 25 Jun 2024
Language Models Need Inductive Biases to Count Inductively Yingshan Chang Yonatan Bisk LRM 32 5 0 30 May 2024
IM-Context: In-Context Learning for Imbalanced Regression Tasks Ismail Nejjar Faez Ahmed Olga Fink 35 1 0 28 May 2024
A rationale from frequency perspective for grokking in training neural network Zhangchen Zhou Yaoyu Zhang Z. Xu 40 2 0 24 May 2024
Learning Syntax Without Planting Trees: Understanding Hierarchical Generalization in Transformers Kabir Ahuja Vidhisha Balachandran Madhur Panwar Tianxing He Noah A. Smith Navin Goyal Yulia Tsvetkov 41 8 0 25 Apr 2024
TEL'M: Test and Evaluation of Language Models G. Cybenko Joshua Ackerman Paul Lintilhac ALM ELM 40 0 0 16 Apr 2024
IsoBench: Benchmarking Multimodal Foundation Models on Isomorphic Representations Deqing Fu Ghazal Khalighinejad Ollie Liu Bhuwan Dhingra Dani Yogatama Robin Jia W. Neiswanger 33 14 0 01 Apr 2024
Neural Redshift: Random Networks are not Random Functions Damien Teney A. Nicolicioiu Valentin Hartmann Ehsan Abbasnejad 103 18 0 04 Mar 2024
Out-of-Domain Generalization in Dynamical Systems Reconstruction Niclas Alexander Göring Florian Hess Manuel Brenner Zahra Monfared Daniel Durstewitz AI4CE 35 10 0 28 Feb 2024
Why are Sensitive Functions Hard for Transformers? Michael Hahn Mark Rofin 41 25 0 15 Feb 2024
Towards Understanding Inductive Bias in Transformers: A View From Infinity Itay Lavie Guy Gur-Ari Z. Ringel 34 1 0 07 Feb 2024
Investigating Recurrent Transformers with Dynamic Halt Jishnu Ray Chowdhury Cornelia Caragea 39 1 0 01 Feb 2024
Simplicity bias, algorithmic probability, and the random logistic map B. Hamzi K. Dingle 23 3 0 31 Dec 2023
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 41 32 0 30 Nov 2023
Looped Transformers are Better at Learning Learning Algorithms Liu Yang Kangwook Lee Robert D. Nowak Dimitris Papailiopoulos 24 24 0 21 Nov 2023
How are Prompts Different in Terms of Sensitivity? Sheng Lu Hendrik Schuff Iryna Gurevych 37 18 0 13 Nov 2023
What Formal Languages Can Transformers Express? A Survey Lena Strobl William Merrill Gail Weiss David Chiang Dana Angluin AI4CE 20 48 0 01 Nov 2023
What Algorithms can Transformers Learn? A Study in Length Generalization Hattie Zhou Arwen Bradley Etai Littwin Noam Razin Omid Saremi Josh Susskind Samy Bengio Preetum Nakkiran 34 110 0 24 Oct 2023
Understanding In-Context Learning in Transformers and LLMs by Learning to Learn Discrete Functions S. Bhattamishra Arkil Patel Phil Blunsom Varun Kanade 21 41 0 04 Oct 2023
In-Context Learning through the Bayesian Prism Madhuri Panwar Kabir Ahuja Navin Goyal BDL 34 38 0 08 Jun 2023
Representational Strengths and Limitations of Transformers Clayton Sanford Daniel J. Hsu Matus Telgarsky 22 81 0 05 Jun 2023
MLRegTest: A Benchmark for the Machine Learning of Regular Languages Sam van der Poel D. Lambert Kalina Kostyszyn Tiantian Gao Rahul Verma ... Emily Peterson C. S. Clair Paul Fodor Chihiro Shibata Jeffrey Heinz ELM 17 8 0 16 Apr 2023
Do deep neural networks have an inbuilt Occam's razor? Chris Mingard Henry Rees Guillermo Valle Pérez A. Louis UQCV BDL 21 16 0 13 Apr 2023
Neural Networks and the Chomsky Hierarchy Grégoire Delétang Anian Ruoss Jordi Grau-Moya Tim Genewein L. Wenliang ... Chris Cundy Marcus Hutter Shane Legg Joel Veness Pedro A. Ortega UQCV 107 130 0 05 Jul 2022
Sensitivity as a Complexity Measure for Sequence Classification Tasks Michael Hahn Dan Jurafsky Richard Futrell 150 22 0 21 Apr 2021
Memorisation versus Generalisation in Pre-trained Language Models Michael Tänzer Sebastian Ruder Marek Rei 94 50 0 16 Apr 2021