Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.12877
Cited By
Improving Instruction-Following in Language Models through Activation Steering
15 October 2024
Alessandro Stolfo
Vidhisha Balachandran
Safoora Yousefi
Eric Horvitz
Besmira Nushi
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Improving Instruction-Following in Language Models through Activation Steering"
27 / 77 papers shown
Title
Inspecting and Editing Knowledge Representations in Language Models
Evan Hernandez
Belinda Z. Li
Jacob Andreas
KELM
67
87
0
03 Apr 2023
Large Language Model Instruction Following: A Survey of Progresses and Challenges
Renze Lou
Kai Zhang
Wenpeng Yin
ALM
LRM
66
23
0
18 Mar 2023
The Flan Collection: Designing Data and Methods for Effective Instruction Tuning
Shayne Longpre
Le Hou
Tu Vu
Albert Webson
Hyung Won Chung
...
Denny Zhou
Quoc V. Le
Barret Zoph
Jason W. Wei
Adam Roberts
ALM
98
669
0
31 Jan 2023
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
49
398
0
19 Dec 2022
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
177
487
0
08 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
122
370
0
07 Dec 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
181
3,117
0
20 Oct 2022
InstructDial: Improving Zero and Few-shot Generalization in Dialogue through Instruction Tuning
Prakhar Gupta
Cathy Jiao
Yi-Ting Yeh
Shikib Mehri
M. Eskénazi
Jeffrey P. Bigham
ALM
91
47
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
507
4,077
0
24 May 2022
Extracting Latent Steering Vectors from Pretrained Language Models
Nishant Subramani
Nivedita Suresh
Matthew E. Peters
LLMSV
68
94
0
10 May 2022
What Makes Instruction Learning Hard? An Investigation and a New Challenge in a Synthetic Environment
Matthew Finlayson
Kyle Richardson
Ashish Sabharwal
Peter Clark
48
13
0
19 Apr 2022
Super-NaturalInstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks
Yizhong Wang
Swaroop Mishra
Pegah Alipoormolabashi
Yeganeh Kordi
Amirreza Mirzaei
...
Chitta Baral
Yejin Choi
Noah A. Smith
Hannaneh Hajishirzi
Daniel Khashabi
ELM
111
842
0
16 Apr 2022
Transformer Feed-Forward Layers Build Predictions by Promoting Concepts in the Vocabulary Space
Mor Geva
Avi Caciularu
Ke Wang
Yoav Goldberg
KELM
109
380
0
28 Mar 2022
Gold Doesn't Always Glitter: Spectral Removal of Linear and Nonlinear Guarded Attribute Information
Shun Shao
Yftah Ziser
Shay B. Cohen
AAML
39
27
0
15 Mar 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
871
12,916
0
04 Mar 2022
A Survey of Controllable Text Generation using Transformer-based Pre-trained Language Models
Hanqing Zhang
Haolin Song
Shaoyu Li
Ming Zhou
Dawei Song
75
223
0
14 Jan 2022
A General Language Assistant as a Laboratory for Alignment
Amanda Askell
Yuntao Bai
Anna Chen
Dawn Drain
Deep Ganguli
...
Tom B. Brown
Jack Clark
Sam McCandlish
C. Olah
Jared Kaplan
ALM
118
777
0
01 Dec 2021
Multitask Prompted Training Enables Zero-Shot Task Generalization
Victor Sanh
Albert Webson
Colin Raffel
Stephen H. Bach
Lintang Sutawika
...
T. Bers
Stella Biderman
Leo Gao
Thomas Wolf
Alexander M. Rush
LRM
342
1,701
0
15 Oct 2021
Finetuned Language Models Are Zero-Shot Learners
Jason W. Wei
Maarten Bosma
Vincent Zhao
Kelvin Guu
Adams Wei Yu
Brian Lester
Nan Du
Andrew M. Dai
Quoc V. Le
ALM
UQCV
178
3,743
0
03 Sep 2021
CrossFit: A Few-shot Learning Challenge for Cross-task Generalization in NLP
Qinyuan Ye
Bill Yuchen Lin
Xiang Ren
283
184
0
18 Apr 2021
Cross-Task Generalization via Natural Language Crowdsourcing Instructions
Swaroop Mishra
Daniel Khashabi
Chitta Baral
Hannaneh Hajishirzi
LRM
138
746
0
18 Apr 2021
Array Programming with NumPy
Charles R. Harris
K. Millman
S. Walt
R. Gommers
Pauli Virtanen
...
Tyler Reddy
Warren Weckesser
Hameer Abbasi
C. Gohlke
T. Oliphant
147
14,953
0
18 Jun 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
130
381
0
16 Apr 2020
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
127
969
0
04 Dec 2019
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
484
42,407
0
03 Dec 2019
Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings
Tolga Bolukbasi
Kai-Wei Chang
James Zou
Venkatesh Saligrama
Adam Kalai
CVBM
FaML
107
3,135
0
21 Jul 2016
Distributed Representations of Words and Phrases and their Compositionality
Tomas Mikolov
Ilya Sutskever
Kai Chen
G. Corrado
J. Dean
NAI
OCL
392
33,521
0
16 Oct 2013
Previous
1
2