Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.20809
Cited By
Improved Representation Steering for Language Models
27 May 2025
Zhengxuan Wu
Qinan Yu
Aryaman Arora
Christopher D. Manning
Christopher Potts
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improved Representation Steering for Language Models"
40 / 40 papers shown
Title
Predicting vs. Acting: A Trade-off Between World Modeling & Agent Modeling
Margaret Li
Weijia Shi
Artidoro Pagnoni
Peter West
Ari Holtzman
58
8
0
02 Jul 2024
Optimizing Instructions and Demonstrations for Multi-Stage Language Model Programs
Krista Opsahl-Ong
Michael J Ryan
Josh Purtell
David Broman
Christopher Potts
Matei A. Zaharia
Omar Khattab
55
32
0
17 Jun 2024
Personalized Steering of Large Language Models: Versatile Steering Vectors Through Bi-directional Preference Optimization
Yuanpu Cao
Tianrong Zhang
Bochuan Cao
Ziyi Yin
Lu Lin
Fenglong Ma
Jinghui Chen
LLMSV
43
27
0
28 May 2024
LoRA-Switch: Boosting the Efficiency of Dynamic LLM Adapters via System-Algorithm Co-design
Rui Kong
Qiyang Li
Xinyu Fang
Qingtian Feng
Qingfeng He
Yazhu Dong
Weijun Wang
Yuanchun Li
Linghe Kong
Yunxin Liu
MoE
71
7
0
28 May 2024
MoEUT: Mixture-of-Experts Universal Transformers
Róbert Csordás
Kazuki Irie
Jürgen Schmidhuber
Christopher Potts
Christopher D. Manning
MoE
51
10
0
25 May 2024
SimPO: Simple Preference Optimization with a Reference-Free Reward
Yu Meng
Mengzhou Xia
Danqi Chen
88
425
0
23 May 2024
LoRA Land: 310 Fine-tuned LLMs that Rival GPT-4, A Technical Report
Justin Zhao
Timothy Wang
Wael Abid
Geoffrey Angus
Arnav Garg
Jeffery Kinnison
Alex Sherstinsky
Piero Molino
Travis Addair
Devvret Rishi
ALM
59
29
0
29 Apr 2024
ReFT: Representation Finetuning for Language Models
Zhengxuan Wu
Aryaman Arora
Zheng Wang
Atticus Geiger
Daniel Jurafsky
Christopher D. Manning
Christopher Potts
OffRL
85
67
0
04 Apr 2024
Efficient Prompting Methods for Large Language Models: A Survey
Kaiyan Chang
Songcheng Xu
Chenglong Wang
Yingfeng Luo
Tong Xiao
Jingbo Zhu
LRM
79
34
0
01 Apr 2024
Comparing Bad Apples to Good Oranges: Aligning Large Language Models via Joint Preference Optimization
Hritik Bansal
Ashima Suvarna
Gantavya Bhatt
Nanyun Peng
Kai-Wei Chang
Aditya Grover
ALM
95
10
0
31 Mar 2024
Parameter-Efficient Fine-Tuning for Large Models: A Comprehensive Survey
Zeyu Han
Chao Gao
Jinyang Liu
Jeff Zhang
Sai Qian Zhang
189
363
0
21 Mar 2024
AutoLoRA: Automatically Tuning Matrix Ranks in Low-Rank Adaptation Based on Meta Learning
Ruiyi Zhang
Rushi Qiang
Sai Ashish Somayajula
Pengtao Xie
90
17
0
14 Mar 2024
Extending Activation Steering to Broad Skills and Multiple Behaviours
Teun van der Weij
Massimo Poesio
Nandi Schoots
LLMSV
61
17
0
09 Mar 2024
DoRA: Weight-Decomposed Low-Rank Adaptation
Shih-yang Liu
Chien-Yi Wang
Hongxu Yin
Pavlo Molchanov
Yu-Chiang Frank Wang
Kwang-Ting Cheng
Min-Hung Chen
66
374
0
14 Feb 2024
Steering Llama 2 via Contrastive Activation Addition
Nina Rimsky
Nick Gabrieli
Julian Schulz
Meg Tong
Evan Hubinger
Alexander Matt Turner
LLMSV
43
188
0
09 Dec 2023
Instruction-Following Evaluation for Large Language Models
Jeffrey Zhou
Tianjian Lu
Swaroop Mishra
Siddhartha Brahma
Sujoy Basu
Yi Luan
Denny Zhou
Le Hou
ELM
ALM
LRM
41
262
0
14 Nov 2023
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
46
97
0
11 Nov 2023
The Linear Representation Hypothesis and the Geometry of Large Language Models
Kiho Park
Yo Joong Choe
Victor Veitch
LLMSV
MILM
82
162
0
07 Nov 2023
Jailbreaking Black Box Large Language Models in Twenty Queries
Patrick Chao
Alexander Robey
Yan Sun
Hamed Hassani
George J. Pappas
Eric Wong
AAML
72
642
0
12 Oct 2023
Understanding and Controlling a Maze-Solving Policy Network
Ulisse Mini
Peli Grietzer
Mrinank Sharma
Austin Meek
M. MacDiarmid
Alexander Matt Turner
29
16
0
12 Oct 2023
The Geometry of Truth: Emergent Linear Structure in Large Language Model Representations of True/False Datasets
Samuel Marks
Max Tegmark
HILM
112
199
0
10 Oct 2023
DSPy: Compiling Declarative Language Model Calls into Self-Improving Pipelines
Omar Khattab
Arnav Singhvi
Paridhi Maheshwari
Zhiyuan Zhang
Keshav Santhanam
...
Thomas T. Joshi
Hanna Moazam
Heather Miller
Matei A. Zaharia
Christopher Potts
RALM
54
248
0
05 Oct 2023
Composing Parameter-Efficient Modules with Arithmetic Operations
Jinghan Zhang
Shiqi Chen
Junteng Liu
Junxian He
KELM
MoMe
55
117
0
26 Jun 2023
Inference-Time Intervention: Eliciting Truthful Answers from a Language Model
Kenneth Li
Oam Patel
Fernanda Viégas
Hanspeter Pfister
Martin Wattenberg
KELM
HILM
83
528
0
06 Jun 2023
LEACE: Perfect linear concept erasure in closed form
Nora Belrose
David Schneider-Joseph
Shauli Ravfogel
Ryan Cotterell
Edward Raff
Stella Biderman
KELM
MU
58
107
0
06 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
295
3,712
0
29 May 2023
AdaLoRA: Adaptive Budget Allocation for Parameter-Efficient Fine-Tuning
Qingru Zhang
Minshuo Chen
Alexander Bukharin
Nikos Karampatziakis
Pengcheng He
Yu Cheng
Weizhu Chen
Tuo Zhao
47
117
0
18 Mar 2023
AdaMix: Mixture-of-Adaptations for Parameter-efficient Model Tuning
Yaqing Wang
Sahaj Agarwal
Subhabrata Mukherjee
Xiaodong Liu
Jing Gao
Ahmed Hassan Awadallah
Jianfeng Gao
MoE
50
126
0
31 Oct 2022
DyLoRA: Parameter Efficient Tuning of Pre-trained Models using Dynamic Search-Free Low-Rank Adaptation
Mojtaba Valipour
Mehdi Rezagholizadeh
I. Kobyzev
A. Ghodsi
52
170
0
14 Oct 2022
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani
Nivedita Suresh
Matthew E. Peters
LLMSV
62
88
0
10 May 2022
Linear Adversarial Concept Erasure
Shauli Ravfogel
Michael Twiton
Yoav Goldberg
Ryan Cotterell
KELM
91
61
0
28 Jan 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
207
4,175
0
27 Oct 2021
Towards a Unified View of Parameter-Efficient Transfer Learning
Junxian He
Chunting Zhou
Xuezhe Ma
Taylor Berg-Kirkpatrick
Graham Neubig
AAML
85
918
0
08 Oct 2021
BitFit: Simple Parameter-efficient Fine-tuning for Transformer-based Masked Language-models
Elad Ben-Zaken
Shauli Ravfogel
Yoav Goldberg
145
1,191
0
18 Jun 2021
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
273
10,099
0
17 Jun 2021
The Power of Scale for Parameter-Efficient Prompt Tuning
Brian Lester
Rami Al-Rfou
Noah Constant
VPVLM
474
3,952
0
18 Apr 2021
Prefix-Tuning: Optimizing Continuous Prompts for Generation
Xiang Lisa Li
Percy Liang
182
4,209
0
01 Jan 2021
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
201
4,368
0
02 Feb 2019
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
248
18,685
0
20 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
501
129,831
0
12 Jun 2017
1