Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.03658
Cited By
The Linear Representation Hypothesis and the Geometry of Large Language Models
7 November 2023
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"The Linear Representation Hypothesis and the Geometry of Large Language Models"
50 / 128 papers shown
Title
Emergent Specialization: Rare Token Neurons in Language Models
Jing Liu
Haozheng Wang
Yueheng Li
MILM
LRM
5
0
0
19 May 2025
Internal Causal Mechanisms Robustly Predict Language Model Out-of-Distribution Behaviors
Jing Huang
Junyi Tao
Thomas F. Icard
Diyi Yang
Christopher Potts
OODD
16
0
0
17 May 2025
LAMP: Extracting Locally Linear Decision Surfaces from LLM World Models
Ryan Chen
Youngmin Ko
Zeyu Zhang
Catherine Cho
Sunny Chung
M. Giuffré
Dennis L. Shung
Bradly C. Stadie
2
0
0
17 May 2025
On the Geometry of Semantics in Next-token Prediction
Yize Zhao
Christos Thrampoulidis
23
0
0
13 May 2025
Understanding In-context Learning of Addition via Activation Subspaces
Xinyan Hu
Kayo Yin
Michael I. Jordan
Jacob Steinhardt
Lijie Chen
53
0
0
08 May 2025
The Dual Power of Interpretable Token Embeddings: Jailbreaking Attacks and Defenses for Diffusion Model Unlearning
Siyi Chen
Yimeng Zhang
Sijia Liu
Q. Qu
AAML
150
0
0
30 Apr 2025
Representation Learning on a Random Lattice
Aryeh Brill
OOD
FAtt
AI4CE
73
0
0
28 Apr 2025
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control
Hannah Cyberey
David E. Evans
LLMSV
76
0
0
23 Apr 2025
The Geometry of Self-Verification in a Task-Specific Reasoning Model
Andrew Lee
Lihao Sun
Chris Wendler
Fernanda Viégas
Martin Wattenberg
LRM
34
0
0
19 Apr 2025
An Empirically Grounded Identifiability Theory Will Accelerate Self-Supervised Learning Research
Patrik Reizinger
Randall Balestriero
David Klindt
Wieland Brendel
40
0
0
17 Apr 2025
On Linear Representations and Pretraining Data Frequency in Language Models
Jack Merullo
Noah A. Smith
Sarah Wiegreffe
Yanai Elazar
40
0
0
16 Apr 2025
Steering Prosocial AI Agents: Computational Basis of LLM's Decision Making in Social Simulation
Ji Ma
40
0
0
16 Apr 2025
Position: Beyond Euclidean -- Foundation Models Should Embrace Non-Euclidean Geometries
Neil He
Jiahong Liu
Buze Zhang
N. Bui
Ali Maatouk
Menglin Yang
Irwin King
Melanie Weber
Rex Ying
29
0
0
11 Apr 2025
ThoughtProbe: Classifier-Guided Thought Space Exploration Leveraging LLM Intrinsic Reasoning
Zijian Wang
Chang Xu
LRM
30
1
0
09 Apr 2025
On the Effectiveness and Generalization of Race Representations for Debiasing High-Stakes Decisions
Dang Nguyen
Chenhao Tan
32
0
0
07 Apr 2025
Language Models Are Implicitly Continuous
Samuele Marro
Davide Evangelista
X. A. Huang
Emanuele La Malfa
M. Lombardi
Michael Wooldridge
33
0
0
04 Apr 2025
From Tokens to Lattices: Emergent Lattice Structures in Language Models
Bo Xiong
Steffen Staab
LRM
24
0
0
04 Apr 2025
LLM Social Simulations Are a Promising Research Method
Jacy Reese Anthis
Ryan Liu
Sean M. Richardson
Austin C. Kozlowski
Bernard Koch
James A. Evans
Erik Brynjolfsson
Michael S. Bernstein
ALM
51
5
0
03 Apr 2025
Misaligned Roles, Misplaced Images: Structural Input Perturbations Expose Multimodal Alignment Blind Spots
Erfan Shayegani
G M Shahariar
Sara Abdali
Lei Yu
Nael B. Abu-Ghazaleh
Yue Dong
AAML
78
0
0
01 Apr 2025
Evaluating and Designing Sparse Autoencoders by Approximating Quasi-Orthogonality
Sewoong Lee
Adam Davies
Marc E. Canby
J. Hockenmaier
LLMSV
67
0
0
31 Mar 2025
Focus Directions Make Your Language Models Pay More Attention to Relevant Contexts
Youxiang Zhu
Ruochen Li
Danqing Wang
Daniel Haehn
Xiaohui Liang
LRM
63
1
0
30 Mar 2025
Shared Global and Local Geometry of Language Model Embeddings
Andrew Lee
Melanie Weber
F. Viégas
Martin Wattenberg
FedML
79
3
0
27 Mar 2025
Calibrating Verbal Uncertainty as a Linear Feature to Reduce Hallucinations
Ziwei Ji
L. Yu
Yeskendir Koishekenov
Yejin Bang
Anthony Hartshorn
Alan Schelten
Cheng Zhang
Pascale Fung
Nicola Cancedda
53
1
0
18 Mar 2025
Cognitive Activation and Chaotic Dynamics in Large Language Models: A Quasi-Lyapunov Analysis of Reasoning Mechanisms
Xiaojian Li
Yongkang Leng
Ruiqing Ding
Hangjie Mo
Shanlin Yang
LRM
52
0
0
15 Mar 2025
Combining Causal Models for More Accurate Abstractions of Neural Networks
Theodora-Mara Pîslar
Sara Magliacane
Atticus Geiger
AI4CE
52
0
0
14 Mar 2025
C^2 ATTACK: Towards Representation Backdoor on CLIP via Concept Confusion
Lijie Hu
Junchi Liao
Weimin Lyu
Shaopeng Fu
Tianhao Huang
Shu Yang
Guimin Hu
Di Wang
AAML
67
0
0
12 Mar 2025
I Predict Therefore I Am: Is Next Token Prediction Enough to Learn Human-Interpretable Concepts from Data?
Yuhang Liu
Dong Gong
Erdun Gao
Zhen Zhang
Zhen Zhang
Biwei Huang
Anton van den Hengel
Javen Qinfeng Shi
Javen Qinfeng Shi
160
0
0
12 Mar 2025
Using Mechanistic Interpretability to Craft Adversarial Attacks against Large Language Models
Thomas Winninger
Boussad Addad
Katarzyna Kapusta
AAML
68
0
0
08 Mar 2025
Bayesian Fields: Task-driven Open-Set Semantic Gaussian Splatting
Dominic Maggio
Luca Carlone
153
0
0
07 Mar 2025
How can representation dimension dominate structurally pruned LLMs?
Mingxue Xu
Lisa Alazraki
Danilo Mandic
56
0
0
06 Mar 2025
Linear Representations of Political Perspective Emerge in Large Language Models
Junsol Kim
James Evans
Aaron Schein
77
2
0
03 Mar 2025
Unlocking Efficient, Scalable, and Continual Knowledge Editing with Basis-Level Representation Fine-Tuning
Tianci Liu
R. Li
Yunzhe Qi
Hui Liu
Xianfeng Tang
...
Qingyu Yin
Monica Cheng
Jun Huan
Haoyu Wang
Jing Gao
KELM
46
2
0
01 Mar 2025
Enhancing Gradient-based Discrete Sampling via Parallel Tempering
Luxu Liang
Yuhang Jia
Feng Zhou
60
0
0
26 Feb 2025
Nexus-O: An Omni-Perceptive And -Interactive Model for Language, Audio, And Vision
Che Liu
Yingji Zhang
D. Zhang
Weijie Zhang
Chenggong Gong
...
André Freitas
Qifan Wang
Z. Xu
Rongjuncheng Zhang
Yong Dai
AuLLM
76
0
0
26 Feb 2025
Mind the Gap: Bridging the Divide Between AI Aspirations and the Reality of Autonomous Characterization
Grace Guinan
Addison Salvador
Michelle A. Smeaton
Andrew Glaws
Hilary Egan
Brian C. Wyatt
Babak Anasori
K. Fiedler
M. Olszta
Steven Spurgeon
76
0
0
25 Feb 2025
The Geometry of Refusal in Large Language Models: Concept Cones and Representational Independence
Tom Wollschlager
Jannes Elstner
Simon Geisler
Vincent Cohen-Addad
Stephan Günnemann
Johannes Gasteiger
LLMSV
64
0
0
24 Feb 2025
Is Free Self-Alignment Possible?
Dyah Adila
Changho Shin
Yijing Zhang
Frederic Sala
MoMe
118
2
0
24 Feb 2025
Activation Steering in Neural Theorem Provers
Shashank Kirtania
LLMSV
166
0
0
21 Feb 2025
Understanding and Rectifying Safety Perception Distortion in VLMs
Xiaohan Zou
Jian Kang
George Kesidis
Lu Lin
181
1
0
18 Feb 2025
LUNAR: LLM Unlearning via Neural Activation Redirection
William F. Shen
Xinchi Qiu
Meghdad Kurmanji
Alex Iacob
Lorenzo Sani
Yihong Chen
Nicola Cancedda
Nicholas D. Lane
MU
56
1
0
11 Feb 2025
Constrained belief updates explain geometric structures in transformer representations
Mateusz Piotrowski
P. Riechers
Daniel Filan
A. Shai
76
0
0
04 Feb 2025
The Energy Loss Phenomenon in RLHF: A New Perspective on Mitigating Reward Hacking
Yuchun Miao
Sen Zhang
Liang Ding
Yuqi Zhang
L. Zhang
Dacheng Tao
81
3
0
31 Jan 2025
Analyzing Fine-tuning Representation Shift for Multimodal LLMs Steering alignment
Pegah Khayatan
Mustafa Shukor
Jayneel Parekh
Matthieu Cord
LLMSV
41
1
0
06 Jan 2025
Representation in large language models
Cameron C. Yetman
41
1
0
03 Jan 2025
ConTrans: Weak-to-Strong Alignment Engineering via Concept Transplantation
Weilong Dong
Xinwei Wu
Renren Jin
Shaoyang Xu
Deyi Xiong
65
7
0
31 Dec 2024
Out-of-distribution generalization via composition: a lens through induction heads in Transformers
Jiajun Song
Zhuoyan Xu
Yiqiao Zhong
88
4
0
31 Dec 2024
ICLR: In-Context Learning of Representations
Core Francisco Park
Andrew Lee
Ekdeep Singh Lubana
Yongyi Yang
Maya Okawa
Kento Nishi
Martin Wattenberg
Hidenori Tanaka
AIFin
120
3
0
29 Dec 2024
Tracking the Feature Dynamics in LLM Training: A Mechanistic Study
Yang Xu
Yue Wang
Hao Wang
117
1
0
23 Dec 2024
Towards scientific discovery with dictionary learning: Extracting biological concepts from microscopy foundation models
Konstantin Donhauser
Kristina Ulicna
Gemma Elyse Moran
Aditya Ravuri
Kian Kenyon-Dean
Cian Eastwood
Jason Hartford
81
0
0
20 Dec 2024
Does Representation Matter? Exploring Intermediate Layers in Large Language Models
Oscar Skean
Md Rifat Arefin
Yann LeCun
Ravid Shwartz-Ziv
81
7
0
12 Dec 2024
1
2
3
Next