The Remarkable Robustness of LLMs: Stages of Inference?

The Remarkable Robustness of LLMs: Stages of Inference?

27 June 2024

Max Tegmark

Papers citing "The Remarkable Robustness of LLMs: Stages of Inference?"

19 / 19 papers shown

Title
Decoding Vision Transformers: the Diffusion Steering Lens Ryota Takatsuki Sonia Joseph Ippei Fujisawa Ryota Kanai DiffM 30 0 0 18 Apr 2025
SuperBPE: Space Travel for Language Models Alisa Liu J. Hayase Valentin Hofmann Sewoong Oh Noah A. Smith Yejin Choi 43 3 0 17 Mar 2025
Weight-based Analysis of Detokenization in Language Models: Understanding the First Stage of Inference Without Inference Go Kamoda Benjamin Heinzerling Tatsuro Inaba Keito Kudo Keisuke Sakaguchi Kentaro Inui MILM 31 0 0 27 Jan 2025
Large Language Models Share Representations of Latent Grammatical Concepts Across Typologically Diverse Languages Jannik Brinkmann Chris Wendler Christian Bartelt Aaron Mueller 51 9 0 10 Jan 2025
Decomposing The Dark Matter of Sparse Autoencoders Joshua Engels Logan Riggs Max Tegmark LLMSV 57 9 0 18 Oct 2024
From Tokens to Words: On the Inner Lexicon of LLMs Guy Kaplan Matanel Oren Yuval Reif Roy Schwartz 48 12 0 08 Oct 2024
Racing Thoughts: Explaining Contextualization Errors in Large Language Models Michael A. Lepori Michael Mozer Asma Ghandeharioun LRM 85 1 0 02 Oct 2024
Residual Stream Analysis with Multi-Layer SAEs Tim Lawson Lucy Farnik Conor Houghton Laurence Aitchison 26 3 0 06 Sep 2024
Programming Refusal with Conditional Activation Steering Bruce W. Lee Inkit Padhi K. Ramamurthy Erik Miehling Pierre L. Dognin Manish Nagireddy Amit Dhurandhar LLMSV 99 13 0 06 Sep 2024
Transformer Layers as Painters Qi Sun Marc Pickett Aakash Kumar Nain Llion Jones AI4CE 31 13 0 12 Jul 2024
Information Flow Routes: Automatically Interpreting Language Models at Scale Javier Ferrando Elena Voita 46 34 0 27 Feb 2024
Do Llamas Work in English? On the Latent Language of Multilingual Transformers Chris Wendler V. Veselovsky Giovanni Monea Robert West 56 95 0 16 Feb 2024
Universal Neurons in GPT2 Language Models Wes Gurnee Theo Horsley Zifan Carl Guo Tara Rezaei Kheirkhah Qinyi Sun Will Hathaway Neel Nanda Dimitris Bertsimas MILM 99 37 0 22 Jan 2024
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 102 169 0 10 Oct 2023
Finding Neurons in a Haystack: Case Studies with Sparse Probing Wes Gurnee Neel Nanda Matthew Pauly Katherine Harvey Dmitrii Troitskii Dimitris Bertsimas MILM 160 186 0 02 May 2023
Dissecting Recall of Factual Associations in Auto-Regressive Language Models Mor Geva Jasmijn Bastings Katja Filippova Amir Globerson KELM 191 261 0 28 Apr 2023
In-context Learning and Induction Heads Catherine Olsson Nelson Elhage Neel Nanda Nicholas Joseph Nova Dassarma ... Tom B. Brown Jack Clark Jared Kaplan Sam McCandlish C. Olah 250 460 0 24 Sep 2022
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 226 405 0 24 Feb 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 253 1,989 0 31 Dec 2020