The Linear Representation Hypothesis and the Geometry of Large Language Models

7 November 2023

Papers citing "The Linear Representation Hypothesis and the Geometry of Large Language Models"

50 / 128 papers shown

Title
Emergent Specialization: Rare Token Neurons in Language Models Jing Liu Haozheng Wang Yueheng Li MILM LRM 5 0 0 19 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors Jing Huang Junyi Tao Thomas F. Icard Diyi Yang Christopher Potts OODD 16 0 0 17 May 2025
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models Ryan Chen Youngmin Ko Zeyu Zhang Catherine Cho Sunny Chung M. Giuffré Dennis L. Shung Bradly C. Stadie 2 0 0 17 May 2025
On the Geometry of Semantics in Next-token Prediction Yize Zhao Christos Thrampoulidis 23 0 0 13 May 2025
Understanding In-context Learning of Addition via Activation Subspaces Xinyan Hu Kayo Yin Michael I. Jordan Jacob Steinhardt Lijie Chen 53 0 0 08 May 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning Siyi Chen Yimeng Zhang Sijia Liu Q. Qu AAML 150 0 0 30 Apr 2025
Representation Learning on a Random Lattice Aryeh Brill OOD FAtt AI4CE 73 0 0 28 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Hannah Cyberey David E. Evans LLMSV 76 0 0 23 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model Andrew Lee Lihao Sun Chris Wendler Fernanda Viégas Martin Wattenberg LRM 34 0 0 19 Apr 2025
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research Patrik Reizinger Randall Balestriero David Klindt Wieland Brendel 40 0 0 17 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models Jack Merullo Noah A. Smith Sarah Wiegreffe Yanai Elazar 40 0 0 16 Apr 2025
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation Ji Ma 40 0 0 16 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries Neil He Jiahong Liu Buze Zhang N. Bui Ali Maatouk Menglin Yang Irwin King Melanie Weber Rex Ying 29 0 0 11 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning Zijian Wang Chang Xu LRM 30 1 0 09 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions Dang Nguyen Chenhao Tan 32 0 0 07 Apr 2025
Language Models Are Implicitly Continuous Samuele Marro Davide Evangelista X. A. Huang Emanuele La Malfa M. Lombardi Michael Wooldridge 33 0 0 04 Apr 2025
From Tokens to Lattices: Emergent Lattice Structures in Language Models Bo Xiong Steffen Staab LRM 24 0 0 04 Apr 2025
LLM Social Simulations Are a Promising Research Method Jacy Reese Anthis Ryan Liu Sean M. Richardson Austin C. Kozlowski Bernard Koch James A. Evans Erik Brynjolfsson Michael S. Bernstein ALM 51 5 0 03 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots Erfan Shayegani G M Shahariar Sara Abdali Lei Yu Nael B. Abu-Ghazaleh Yue Dong AAML 78 0 0 01 Apr 2025
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality Sewoong Lee Adam Davies Marc E. Canby J. Hockenmaier LLMSV 67 0 0 31 Mar 2025
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts Youxiang Zhu Ruochen Li Danqing Wang Daniel Haehn Xiaohui Liang LRM 63 1 0 30 Mar 2025
Shared Global and Local Geometry of Language Model Embeddings Andrew Lee Melanie Weber F. Viégas Martin Wattenberg FedML 79 3 0 27 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations Ziwei Ji L. Yu Yeskendir Koishekenov Yejin Bang Anthony Hartshorn Alan Schelten Cheng Zhang Pascale Fung Nicola Cancedda 53 1 0 18 Mar 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms Xiaojian Li Yongkang Leng Ruiqing Ding Hangjie Mo Shanlin Yang LRM 52 0 0 15 Mar 2025
Combining Causal Models for More Accurate Abstractions of Neural Networks Theodora-Mara Pîslar Sara Magliacane Atticus Geiger AI4CE 52 0 0 14 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion Lijie Hu Junchi Liao Weimin Lyu Shaopeng Fu Tianhao Huang Shu Yang Guimin Hu Di Wang AAML 67 0 0 12 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data? Yuhang Liu Dong Gong Erdun Gao Zhen Zhang Zhen Zhang Biwei Huang Anton van den Hengel Javen Qinfeng Shi Javen Qinfeng Shi 160 0 0 12 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models Thomas Winninger Boussad Addad Katarzyna Kapusta AAML 68 0 0 08 Mar 2025
Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting Dominic Maggio Luca Carlone 153 0 0 07 Mar 2025
How can representation dimension dominate structurally pruned LLMs? Mingxue Xu Lisa Alazraki Danilo Mandic 56 0 0 06 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models Junsol Kim James Evans Aaron Schein 77 2 0 03 Mar 2025
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning Tianci Liu R. Li Yunzhe Qi Hui Liu Xianfeng Tang ... Qingyu Yin Monica Cheng Jun Huan Haoyu Wang Jing Gao KELM 46 2 0 01 Mar 2025
Enhancing Gradient-based Discrete Sampling via Parallel Tempering Luxu Liang Yuhang Jia Feng Zhou 60 0 0 26 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision Che Liu Yingji Zhang D. Zhang Weijie Zhang Chenggong Gong ... André Freitas Qifan Wang Z. Xu Rongjuncheng Zhang Yong Dai AuLLM 76 0 0 26 Feb 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization Grace Guinan Addison Salvador Michelle A. Smeaton Andrew Glaws Hilary Egan Brian C. Wyatt Babak Anasori K. Fiedler M. Olszta Steven Spurgeon 76 0 0 25 Feb 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence Tom Wollschlager Jannes Elstner Simon Geisler Vincent Cohen-Addad Stephan Günnemann Johannes Gasteiger LLMSV 64 0 0 24 Feb 2025
Is Free Self-Alignment Possible? Dyah Adila Changho Shin Yijing Zhang Frederic Sala MoMe 118 2 0 24 Feb 2025
Activation Steering in Neural Theorem Provers Shashank Kirtania LLMSV 166 0 0 21 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs Xiaohan Zou Jian Kang George Kesidis Lu Lin 181 1 0 18 Feb 2025
LUNAR: LLM Unlearning via Neural Activation Redirection William F. Shen Xinchi Qiu Meghdad Kurmanji Alex Iacob Lorenzo Sani Yihong Chen Nicola Cancedda Nicholas D. Lane MU 56 1 0 11 Feb 2025
Constrained belief updates explain geometric structures in transformer representations Mateusz Piotrowski P. Riechers Daniel Filan A. Shai 76 0 0 04 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking Yuchun Miao Sen Zhang Liang Ding Yuqi Zhang L. Zhang Dacheng Tao 81 3 0 31 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment Pegah Khayatan Mustafa Shukor Jayneel Parekh Matthieu Cord LLMSV 41 1 0 06 Jan 2025
Representation in large language models Cameron C. Yetman 41 1 0 03 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation Weilong Dong Xinwei Wu Renren Jin Shaoyang Xu Deyi Xiong 65 7 0 31 Dec 2024
Out-of-distribution generalization via composition: a lens through induction heads in Transformers Jiajun Song Zhuoyan Xu Yiqiao Zhong 88 4 0 31 Dec 2024
ICLR: In-Context Learning of Representations Core Francisco Park Andrew Lee Ekdeep Singh Lubana Yongyi Yang Maya Okawa Kento Nishi Martin Wattenberg Hidenori Tanaka AIFin 120 3 0 29 Dec 2024
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study Yang Xu Yue Wang Hao Wang 117 1 0 23 Dec 2024
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models Konstantin Donhauser Kristina Ulicna Gemma Elyse Moran Aditya Ravuri Kian Kenyon-Dean Cian Eastwood Jason Hartford 81 0 0 20 Dec 2024
Does Representation Matter? Exploring Intermediate Layers in Large Language Models Oscar Skean Md Rifat Arefin Yann LeCun Ravid Shwartz-Ziv 81 7 0 12 Dec 2024