Explorations of Self-Repair in Language Models

v1v2 (latest)

Explorations of Self-Repair in Language Models

23 February 2024

ArXiv (abs)PDF HTML

Papers citing "Explorations of Self-Repair in Language Models"

8 / 8 papers shown

Title
Understanding Gated Neurons in Transformers from Their Input-Output Functionality Sebastian Gerstner Hinrich Schütze MILM FAtt 198 0 0 23 May 2025
Decoding Vision Transformers: the Diffusion Steering Lens Ryota Takatsuki Sonia Joseph Ippei Fujisawa Ryota Kanai DiffM 88 0 0 18 Apr 2025
The Geometry of Concepts: Sparse Autoencoder Feature Structure Yuxiao Li Eric J. Michaud David D. Baek Joshua Engels Xiaoqing Sun Max Tegmark 97 18 0 10 Oct 2024
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 319 524 0 24 Sep 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 455 2,113 0 31 Dec 2020
Understanding the Role of Individual Units in a Deep Neural Network David Bau Jun-Yan Zhu Hendrik Strobelt Àgata Lapedriza Bolei Zhou Antonio Torralba GAN 69 452 0 10 Sep 2020
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 103 1,068 0 25 May 2019
Residual Connections Encourage Iterative Inference Stanislaw Jastrzebski Devansh Arpit Nicolas Ballas Vikas Verma Tong Che Yoshua Bengio 57 155 0 13 Oct 2017