Learning Transformer Programs

1 June 2023

Alexander Wettig

Papers citing "Learning Transformer Programs"

26 / 26 papers shown

Title
Understanding the Logic of Direct Preference Alignment through Logic Kyle Richardson Vivek Srikumar Ashish Sabharwal 85 2 0 23 Dec 2024
Quantifying artificial intelligence through algebraic generalization Takuya Ito Murray Campbell L. Horesh Tim Klinger Parikshit Ram ELM 53 0 0 08 Nov 2024
Hypothesis Testing the Circuit Hypothesis in LLMs Claudia Shi Nicolas Beltran-Velez Achille Nazaret Carolina Zheng Adrià Garriga-Alonso Andrew Jesson Maggie Makar David M. Blei 45 6 0 16 Oct 2024
A mechanistically interpretable neural network for regulatory genomics Alex Tseng Gökçen Eraslan Tommaso Biancalani Gabriele Scalia 31 0 0 08 Oct 2024
Integration of Mamba and Transformer -- MAT for Long-Short Range Time Series Forecasting with Application to Weather Dynamics Wenqing Zhang Junming Huang Ruotong Wang Changsong Wei Wenqian Huang Yuxin Qiao Mamba 40 10 0 13 Sep 2024
How Transformers Utilize Multi-Head Attention in In-Context Learning? A Case Study on Sparse Linear Regression Xingwu Chen Lei Zhao Difan Zou 49 6 0 08 Aug 2024
Representing Rule-based Chatbots with Transformers Dan Friedman Abhishek Panigrahi Danqi Chen 71 1 0 15 Jul 2024
Algorithmic Language Models with Neurally Compiled Libraries Lucas Saldyt Subbarao Kambhampati LRM 62 0 0 06 Jul 2024
Finding Transformer Circuits with Edge Pruning Adithya Bhaskar Alexander Wettig Dan Friedman Danqi Chen 68 17 0 24 Jun 2024
A Philosophical Introduction to Language Models - Part II: The Way Forward Raphael Milliere Cameron Buckner LRM 66 14 0 06 May 2024
Mechanistic Interpretability for AI Safety -- A Review Leonard Bereska E. Gavves AI4CE 40 117 0 22 Apr 2024
What Can Transformer Learn with Varying Depth? Case Studies on Sequence Learning Tasks Xingwu Chen Difan Zou ViT 26 12 0 02 Apr 2024
Discrete Neural Algorithmic Reasoning Gleb Rodionov Liudmila Prokhorenkova OOD NAI 44 3 0 18 Feb 2024
Towards Uncovering How Large Language Model Works: An Explainability Perspective Haiyan Zhao Fan Yang Bo Shen Himabindu Lakkaraju Mengnan Du 35 10 0 16 Feb 2024
PaDeLLM-NER: Parallel Decoding in Large Language Models for Named Entity Recognition Jinghui Lu Ziwei Yang Yanjie Wang Xuejing Liu Brian Mac Namee Can Huang MoE 53 4 0 07 Feb 2024
Simulation of Graph Algorithms with Looped Transformers Artur Back de Luca K. Fountoulakis 58 14 0 02 Feb 2024
What Formal Languages Can Transformers Express? A Survey Lena Strobl William Merrill Gail Weiss David Chiang Dana Angluin AI4CE 20 48 0 01 Nov 2023
Codebook Features: Sparse and Discrete Interpretability for Neural Networks Alex Tamkin Mohammad Taufeeque Noah D. Goodman 35 27 0 26 Oct 2023
Masked Hard-Attention Transformers Recognize Exactly the Star-Free Languages Andy Yang David Chiang Dana Angluin 30 15 0 21 Oct 2023
Large Language Models Michael R Douglas LLMAG LM&MA 54 564 0 11 Jul 2023
Finding Alignments Between Interpretable Causal Variables and Distributed Neural Representations Atticus Geiger Zhengxuan Wu Christopher Potts Thomas Icard Noah D. Goodman CML 75 99 0 05 Mar 2023
Tracr: Compiled Transformers as a Laboratory for Interpretability David Lindner János Kramár Sebastian Farquhar Matthew Rahtz Tom McGrath Vladimir Mikulik 29 72 0 12 Jan 2023
Interpretability in the Wild: a Circuit for Indirect Object Identification in GPT-2 small Kevin Wang Alexandre Variengien Arthur Conmy Buck Shlegeris Jacob Steinhardt 212 497 0 01 Nov 2022
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 463 0 24 Sep 2022
Train Short, Test Long: Attention with Linear Biases Enables Input Length Extrapolation Ofir Press Noah A. Smith M. Lewis 253 698 0 27 Aug 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 409 0 24 Feb 2021