v1v2 (latest)

Linear Representations of Political Perspective Emerge in Large Language Models

3 March 2025

Papers citing "Linear Representations of Political Perspective Emerge in Large Language Models"

42 / 42 papers shown

Title
LegiGPT: Party Politics and Transport Policy with Large Language Model Hyunsoo Yun Eun Hak Lee 11 0 0 20 Jun 2025
Dissecting Bias in LLMs: A Mechanistic Interpretability Perspective Bhavik Chandna Zubair Bashir Procheta Sen 85 0 0 05 Jun 2025
From Directions to Cones: Exploring Multidimensional Representations of Propositional Facts in LLMs Stanley Yu Vaidehi Bulusu Oscar Yasunaga Clayton Lau Cole Blondin Sean O'Brien Kevin Zhu Vasu Sharma 49 0 0 27 May 2025
EuroCon: Benchmarking Parliament Deliberation for Political Consensus Finding Zhaowei Zhang Minghua Yi Mengmeng Wang Fengshuo Bai Zilong Zheng Yipeng Kang Yaodong Yang 61 1 0 26 May 2025
LLM Social Simulations Are a Promising Research Method Jacy Reese Anthis Ryan Liu Sean M. Richardson Austin C. Kozlowski Bernard Koch James A. Evans Erik Brynjolfsson Michael S. Bernstein ALM 97 15 0 03 Apr 2025
Generative Agent Simulations of 1,000 People Joon Sung Park Carolyn Q. Zou Aaron Shaw Benjamin Mako Hill Carrie J. Cai Meredith Ringel Morris Robb Willer Percy Liang Michael S. Bernstein SyDa VGen LM&Ro AI4CE 74 103 0 15 Nov 2024
Hidden Persuaders: LLMs' Political Leaning and Their Influence on Voters Yujin Potter Shiyang Lai Junsol Kim James Evans Basel Alomair 82 20 0 31 Oct 2024
Refusal in Language Models Is Mediated by a Single Direction Andy Arditi Oscar Obeso Aaquib Syed Daniel Paleka Nina Panickssery Wes Gurnee Neel Nanda 169 218 0 17 Jun 2024
Dishonesty in Helpful and Harmless Alignment Youcheng Huang Jingkun Tang Duanyu Feng Zheng Zhang Wenqiang Lei Jiancheng Lv Anthony G. Cohn LLMSV 91 4 0 04 Jun 2024
Measuring Political Bias in Large Language Models: What Is Said and How It Is Said Yejin Bang Delong Chen Nayeon Lee Pascale Fung 85 41 0 27 Mar 2024
Political Compass or Spinning Arrow? Towards More Meaningful Evaluations for Values and Opinions in Large Language Models Paul Röttger Valentin Hofmann Valentina Pyatkin Musashi Hinck Hannah Rose Kirk Hinrich Schütze Dirk Hovy ELM 87 64 0 26 Feb 2024
A Language Model's Guide Through Latent Space Dimitri von Rutte Sotiris Anagnostidis Gregor Bachmann Thomas Hofmann 105 28 0 22 Feb 2024
Large Language Models Empowered Agent-based Modeling and Simulation: A Survey and Perspectives Chen Gao Xiaochong Lan Nian Li Yuan Yuan Jingtao Ding Zhilun Zhou Fengli Xu Yong Li LLMAG AI4CE LM&Ro 105 132 0 19 Dec 2023
Measurement in the Age of LLMs: An Application to Ideological Scaling Sean O'Hagan Aaron Schein 196 11 0 14 Dec 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models Kiho Park Yo Joong Choe Victor Veitch LLMSV MILM 170 190 0 07 Nov 2023
Linear Representations of Sentiment in Large Language Models Curt Tigges Oskar John Hollinsworth Atticus Geiger Neel Nanda MILM 67 91 0 23 Oct 2023
Towards Understanding Sycophancy in Language Models Mrinank Sharma Meg Tong Tomasz Korbak David Duvenaud Amanda Askell ... Oliver Rausch Nicholas Schiefer Da Yan Miranda Zhang Ethan Perez 364 246 0 20 Oct 2023
Concept-Guided Chain-of-Thought Prompting for Pairwise Comparison Scoring of Texts with Large Language Models Patrick Y. Wu Jonathan Nagler Joshua A. Tucker Solomon Messing LRM 127 3 0 18 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets Samuel Marks Max Tegmark HILM 144 227 0 10 Oct 2023
Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms Petter Törnberg D. Valeeva J. Uitermark Christopher Bail LLMAG 78 44 0 05 Oct 2023
Language Models Represent Space and Time Wes Gurnee Max Tegmark 133 167 0 03 Oct 2023
Emergent Linear Representations in World Models of Self-Supervised Sequence Models Neel Nanda Andrew Lee Martin Wattenberg FAtt MILM 120 186 0 02 Sep 2023
Llama 2: Open Foundation and Fine-Tuned Chat Models Hugo Touvron Louis Martin Kevin R. Stone Peter Albert Amjad Almahairi ... Sharan Narang Aurelien Rodriguez Robert Stojnic Sergey Edunov Thomas Scialom AI4MH ALM 454 12,106 0 18 Jul 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model Kenneth Li Oam Patel Fernanda Viégas Hanspeter Pfister Martin Wattenberg KELM HILM 143 584 0 06 Jun 2023
AI-Augmented Surveys: Leveraging Large Language Models and Surveys for Opinion Prediction Junsol Kim Byungkyu Lee SyDa 105 37 0 16 May 2023
From Pretraining Data to Language Models to Downstream Tasks: Tracking the Trails of Political Biases Leading to Unfair NLP Models Shangbin Feng Chan Young Park Yuhan Liu Yulia Tsvetkov 100 248 0 15 May 2023
Generative Agents: Interactive Simulacra of Human Behavior J. Park Joseph C. O'Brien Carrie J. Cai Meredith Ringel Morris Percy Liang Michael S. Bernstein LM&Ro AI4CE 422 1,989 0 07 Apr 2023
Whose Opinions Do Language Models Reflect? Shibani Santurkar Esin Durmus Faisal Ladhak Cinoo Lee Percy Liang Tatsunori Hashimoto 92 447 0 30 Mar 2023
Large Language Models Can Be Used to Estimate the Latent Positions of Politicians Patrick Y. Wu Jonathan Nagler Joshua A. Tucker Solomon Messing 175 28 0 21 Mar 2023
Language Models as Agent Models Jacob Andreas LLMAG 82 141 0 03 Dec 2022
Toy Models of Superposition Nelson Elhage Tristan Hume Catherine Olsson Nicholas Schiefer T. Henighan ... Sam McCandlish Jared Kaplan Dario Amodei Martin Wattenberg C. Olah AAML MILM 200 380 0 21 Sep 2022
CommunityLM: Probing Partisan Worldviews from Language Models Hang Jiang Doug Beeferman Brandon Roy Dwaipayan Roy 166 32 0 15 Sep 2022
Out of One, Many: Using Language Models to Simulate Human Samples Lisa P. Argyle Ethan C. Busby Nancy Fulda Joshua R Gubler Christopher Rytting David Wingate SyDa 103 607 0 14 Sep 2022
Assessing Political Prudence of Open-domain Chatbots Yejin Bang Nayeon Lee Etsuko Ishii Andrea Madotto Pascale Fung 70 25 0 11 Jun 2021
Probing Classifiers: Promises, Shortcomings, and Advances Yonatan Belinkov 311 457 0 24 Feb 2021
Intrinsic Bias Metrics Do Not Correlate with Application Bias Seraphina Goldfarb-Tarrant Rebecca Marchant Ricardo Muñoz Sánchez Mugdha Pandya Adam Lopez 155 180 0 31 Dec 2020
Language (Technology) is Power: A Critical Survey of "Bias" in NLP Su Lin Blodgett Solon Barocas Hal Daumé Hanna M. Wallach 159 1,257 0 28 May 2020
Measurement and Fairness Abigail Z. Jacobs Hanna M. Wallach 90 402 0 11 Dec 2019
Are Sixteen Heads Really Better than One? Paul Michel Omer Levy Graham Neubig MoE 120 1,070 0 25 May 2019
The Geometry of Culture: Analyzing Meaning through Word Embeddings Austin C. Kozlowski Matt Taddy James A. Evans 58 393 0 25 Mar 2018
Understanding intermediate layers using linear classifier probes Guillaume Alain Yoshua Bengio FAtt 175 958 0 05 Oct 2016
Probabilistic Archetypal Analysis S. Seth M. Eugster 91 69 0 29 Dec 2013