Unintended Impacts of LLM Alignment on Global Representation

Unintended Impacts of LLM Alignment on Global Representation

22 February 2024

Michael Joseph Ryan

William B. Held

Diyi Yang

Papers citing "Unintended Impacts of LLM Alignment on Global Representation"

11 / 11 papers shown

Title
Steering the CensorShip: Uncovering Representation Vectors for LLM "Thought" Control Hannah Cyberey David E. Evans LLMSV 76 0 0 23 Apr 2025
CARE: Aligning Language Models for Regional Cultural Awareness Geyang Guo Tarek Naous Hiromi Wakaki Yukiko Nishimura Yuki Mitsufuji Alan Ritter Wei-ping Xu 52 0 0 07 Apr 2025
AI as a deliberative partner fosters intercultural empathy for Americans but fails for Latin American participants Isabel Villanueva Tara Bobinac Binwei Yao Junjie Hu Kaiping Chen 29 0 0 04 Apr 2025
The Call for Socially Aware Language Technologies Diyi Yang Dirk Hovy David Jurgens Barbara Plank VLM 61 11 0 24 Feb 2025
Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance Manon Reusens Philipp Borchert Jochen De Weerdt Bart Baesens 39 0 0 25 Jun 2024
From Distributional to Overton Pluralism: Investigating Large Language Model Alignment Thom Lake Eunsol Choi Greg Durrett 44 9 0 25 Jun 2024
Crosslingual Capabilities and Knowledge Barriers in Multilingual Large Language Models Lynn Chua Badih Ghazi Yangsibo Huang Pritish Kamath Ravi Kumar Pasin Manurangsi Amer Sinha Chulin Xie Chiyuan Zhang 63 1 0 23 Jun 2024
Conformal Alignment: Knowing When to Trust Foundation Models with Guarantees Yu Gui Ying Jin Zhimei Ren MedIm 38 18 0 16 May 2024
Understanding the Effects of RLHF on LLM Generalisation and Diversity Robert Kirk Ishita Mediratta Christoforos Nalmpantis Jelena Luketina Eric Hambro Edward Grefenstette Roberta Raileanu AI4CE ALM 106 121 0 10 Oct 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 313 11,953 0 04 Mar 2022
The Pile: An 800GB Dataset of Diverse Text for Language Modeling Leo Gao Stella Biderman Sid Black Laurence Golding Travis Hoppe ... Horace He Anish Thite Noa Nabeshima Shawn Presser Connor Leahy AIMat 253 1,989 0 31 Dec 2020