Word Embeddings Are Steers for Language Models

Word Embeddings Are Steers for Language Models

22 May 2023

Tarek F. Abdelzaher

Heng Ji

Papers citing "Word Embeddings Are Steers for Language Models"

14 / 14 papers shown

Title
CLASH: Evaluating Language Models on Judging High-Stakes Dilemmas from Multiple Perspectives Ayoung Lee Ryan Sungmo Kwon Peter Railton Lu Wang ELM 51 0 0 15 Apr 2025
Personalize Your LLM: Fake it then Align it Yijing Zhang Dyah Adila Changho Shin Frederic Sala 88 0 0 02 Mar 2025
PropaInsight: Toward Deeper Understanding of Propaganda in Terms of Techniques, Appeals, and Intent Jiateng Liu Lin Ai Zizhou Liu Payam Karisani Zheng Hui May Fung Preslav Nakov Julia Hirschberg Heng Ji DiffM 90 4 0 17 Feb 2025
Evaluating the Prompt Steerability of Large Language Models Erik Miehling Michael Desmond K. Ramamurthy Elizabeth M. Daly Pierre L. Dognin Jesus Rios Djallel Bouneffouf Miao Liu LLMSV 89 3 0 19 Nov 2024
Focus On This, Not That! Steering LLMs With Adaptive Feature Specification Tom A. Lamb Adam Davies Alasdair Paren Philip Torr Francesco Pinto 52 0 0 30 Oct 2024
Programming Refusal with Conditional Activation Steering Bruce W. Lee Inkit Padhi K. Ramamurthy Erik Miehling Pierre L. Dognin Manish Nagireddy Amit Dhurandhar LLMSV 105 14 0 06 Sep 2024
Continuous Language Model Interpolation for Dynamic and Controllable Text Generation Sara Kangaslahti David Alvarez-Melis KELM 34 0 0 10 Apr 2024
Controlled Text Generation with Natural Language Instructions Wangchunshu Zhou Yuchen Eleanor Jiang Ethan Gotlieb Wilcox Ryan Cotterell Mrinmaya Sachan 160 84 0 27 Apr 2023
NormSAGE: Multi-Lingual Multi-Cultural Norm Discovery from Conversations On-the-Fly Yi R. Fung Tuhin Chakraborty Hao Guo Owen Rambow Smaranda Muresan Heng Ji 21 39 0 16 Oct 2022
Large Language Models are Zero-Shot Reasoners Takeshi Kojima S. Gu Machel Reid Yutaka Matsuo Yusuke Iwasawa ReLM LRM 328 4,077 0 24 May 2022
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP Timo Schick Sahana Udupa Hinrich Schütze 259 374 0 28 Feb 2021
Debiasing Pre-trained Contextualised Embeddings Masahiro Kaneko Danushka Bollegala 218 138 0 23 Jan 2021
The Woman Worked as a Babysitter: On Biases in Language Generation Emily Sheng Kai-Wei Chang Premkumar Natarajan Nanyun Peng 223 618 0 03 Sep 2019
Efficient Estimation of Word Representations in Vector Space Tomáš Mikolov Kai Chen G. Corrado J. Dean 3DV 281 31,267 0 16 Jan 2013