Learning the greatest common divisor: explaining transformer predictions

v1v2 (latest)

Learning the greatest common divisor: explaining transformer predictions

29 August 2023

Franccois Charton

ArXiv (abs)PDF HTML

Papers citing "Learning the greatest common divisor: explaining transformer predictions"

14 / 14 papers shown

Title
Lightweight Latent Verifiers for Efficient Meta-Generation Strategies Bartosz Piotrowski Witold Drzewakowski Konrad Staniszewski Piotr Miłoś LRM 65 0 0 23 Apr 2025
Int2Int: a framework for mathematics with transformers François Charton ViT 174 0 0 22 Feb 2025
Formal Mathematical Reasoning: A New Frontier in AI Kaiyu Yang Gabriel Poesia Jingxuan He Wenda Li Kristin Lauter Swarat Chaudhuri Dawn Song LRM AI4CE 145 36 0 20 Dec 2024
Regress, Don't Guess -- A Regression-like Loss on Number Tokens for Language Models Jonas Zausinger Lars Pennig Anamarija Kozina Sean Sdahl Julian Sikora ... Anna Ketteler Thorben Prein Vishwa Mohan Singh Michael Morris Danziger Jannis Born 75 3 0 04 Nov 2024
Transformers to Predict the Applicability of Symbolic Integration Routines Rashid Barket Uzma Shafiq Matthew England Juergen Gerhard 46 0 0 31 Oct 2024
Identifying Sub-networks in Neural Networks via Functionally Similar Representations Tian Gao Amit Dhurandhar Karthikeyan N. Ramamurthy Dennis L. Wei 104 0 0 21 Oct 2024
Emergent properties with repeated examples Francois Charton Julia Kempe AIMat 102 5 0 09 Oct 2024
Clustering and Alignment: Understanding the Training Dynamics in Modular Addition Tiberiu Musat 99 1 0 18 Aug 2024
Automated Software Vulnerability Static Code Analysis Using Generative Pre-Trained Transformer Models Elijah Pelofske Vincent Urias L. Liebrock 71 1 0 31 Jul 2024
Why Do You Grok? A Theoretical Analysis of Grokking Modular Addition Mohamad Amin Mohamadi Zhiyuan Li Lei Wu Danica J. Sutherland 107 11 0 17 Jul 2024
Acceleration of Grokking in Learning Arithmetic Operations via Kolmogorov-Arnold Representation Yeachan Park Minseok Kim Yeoneung Kim 76 1 0 26 May 2024
Transforming the Bootstrap: Using Transformers to Compute Scattering Amplitudes in Planar N = 4 Super Yang-Mills Theory Tianji Cai G. W. Merz Franccois Charton Niklas Nolte Matthias Wilhelm K. Cranmer Lance J. Dixon 178 16 0 09 May 2024
Opening the AI black box: program synthesis via mechanistic interpretability Eric J. Michaud Isaac Liao Vedang Lad Ziming Liu Anish Mudide Chloe Loughridge Zifan Carl Guo Tara Rezaei Kheirkhah Mateja Vukelić Max Tegmark 88 13 0 07 Feb 2024
Dichotomy of Early and Late Phase Implicit Biases Can Provably Induce Grokking Kaifeng Lyu Jikai Jin Zhiyuan Li Simon S. Du Jason D. Lee Wei Hu AI4CE 92 38 0 30 Nov 2023