Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.11717
Cited By
v1
v2 (latest)
Refusal in Language Models Is Mediated by a Single Direction
17 June 2024
Andy Arditi
Oscar Obeso
Aaquib Syed
Daniel Paleka
Nina Panickssery
Wes Gurnee
Neel Nanda
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Refusal in Language Models Is Mediated by a Single Direction"
5 / 55 papers shown
Title
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
474
10,367
0
17 Jun 2021
Implicit Representations of Meaning in Neural Language Models
Belinda Z. Li
Maxwell Nye
Jacob Andreas
NAI
MILM
60
176
0
01 Jun 2021
The Pile: An 800GB Dataset of Diverse Text for Language Modeling
Leo Gao
Stella Biderman
Sid Black
Laurence Golding
Travis Hoppe
...
Horace He
Anish Thite
Noa Nabeshima
Shawn Presser
Connor Leahy
AIMat
450
2,096
0
31 Dec 2020
Null It Out: Guarding Protected Attributes by Iterative Nullspace Projection
Shauli Ravfogel
Yanai Elazar
Hila Gonen
Michael Twiton
Yoav Goldberg
138
387
0
16 Apr 2020
PyTorch: An Imperative Style, High-Performance Deep Learning Library
Adam Paszke
Sam Gross
Francisco Massa
Adam Lerer
James Bradbury
...
Sasank Chilamkurthy
Benoit Steiner
Lu Fang
Junjie Bai
Soumith Chintala
ODL
514
42,449
0
03 Dec 2019
Previous
1
2