Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2405.17345
Cited By
Exploring and steering the moral compass of Large Language Models
27 May 2024
Alejandro Tlaie
LLMSV
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Exploring and steering the moral compass of Large Language Models"
3 / 3 papers shown
Title
Programming Refusal with Conditional Activation Steering
Bruce W. Lee
Inkit Padhi
Karthikeyan N. Ramamurthy
Erik Miehling
Pierre Dognin
Manish Nagireddy
Amit Dhurandhar
LLMSV
130
20
0
06 Sep 2024
Towards Best Practices of Activation Patching in Language Models: Metrics and Methods
Fred Zhang
Neel Nanda
LLMSV
85
104
0
27 Sep 2023
In-context Learning and Induction Heads
Catherine Olsson
Nelson Elhage
Neel Nanda
Nicholas Joseph
Nova Dassarma
...
Tom B. Brown
Jack Clark
Jared Kaplan
Sam McCandlish
C. Olah
296
494
0
24 Sep 2022
1