Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.17712
Cited By
Understanding How Value Neurons Shape the Generation of Specified Values in LLMs
23 May 2025
Yi Su
Jiayi Zhang
Shu Yang
Xinhai Wang
Lijie Hu
Di Wang
OffRL
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Understanding How Value Neurons Shape the Generation of Specified Values in LLMs"
27 / 27 papers shown
Title
The Compositional Architecture of Regret in Large Language Models
Xiangxiang Cui
Shu Yang
Tianjin Huang
Wanyu Lin
Lijie Hu
Di Wang
22
0
0
18 Jun 2025
Flattery in Motion: Benchmarking and Analyzing Sycophancy in Video-LLMs
Wenrui Zhou
Shu Yang
Qingsong Yang
Zikun Guo
Lijie Hu
Di Wang
17
0
0
08 Jun 2025
Towards User-level Private Reinforcement Learning with Human Feedback
Jing Zhang
Mingxi Lei
Meng Ding
Mengdi Li
Zihang Xiang
Difei Xu
Jinhui Xu
Di Wang
107
3
0
22 Feb 2025
Fraud-R1 : A Multi-Round Benchmark for Assessing the Robustness of LLM Against Augmented Fraud and Phishing Inducements
Shu Yang
Shenzhe Zhu
Zeyu Wu
Keyu Wang
Junchi Yao
Junchao Wu
Lijie Hu
Mengdi Li
Derek F. Wong
Di Wang
71
9
0
18 Feb 2025
Mechanistic Unveiling of Transformer Circuits: Self-Influence as a Key to Model Reasoning
Lefei Zhang
Lijie Hu
Di Wang
LRM
194
5
0
17 Feb 2025
EAP-GP: Mitigating Saturation Effect in Gradient-based Automated Circuit Identification
Li Zhang
Wenshuo Dong
Zhuoran Zhang
Shu Yang
Lijie Hu
Ninghao Liu
Pan Zhou
Di Wang
99
4
0
07 Feb 2025
Understanding and Mitigating Gender Bias in LLMs via Interpretable Neuron Editing
Zeping Yu
Sophia Ananiadou
KELM
114
3
0
24 Jan 2025
A Survey on Responsible LLMs: Inherent Risk, Malicious Use, and Mitigation Strategy
Huandong Wang
Wenjie Fu
Yingzhou Tang
Zhilong Chen
Yanhua Huang
J. Piao
Chen Gao
Fengli Xu
Tao Jiang
Yongqian Li
PILM
99
10
0
17 Jan 2025
Locate-then-edit for Multi-hop Factual Recall under Knowledge Editing
Zhuoran Zhang
Yongqian Li
Zijian Kan
Keyuan Cheng
Lijie Hu
Di Wang
KELM
83
13
0
08 Oct 2024
Interpreting Arithmetic Mechanism in Large Language Models through Comparative Neuron Analysis
Zeping Yu
Sophia Ananiadou
LRM
MILM
109
14
0
21 Sep 2024
Towards Understanding Multi-Task Learning (Generalization) of LLMs via Detecting and Exploring Task-Specific Neurons
Yongqi Leng
Deyi Xiong
116
8
0
09 Jul 2024
Language-Specific Neurons: The Key to Multilingual Capabilities in Large Language Models
Tianyi Tang
Wenyang Luo
Haoyang Huang
Dongdong Zhang
Xiaolei Wang
Xin Zhao
Furu Wei
Ji-Rong Wen
105
60
0
26 Feb 2024
KorNAT: LLM Alignment Benchmark for Korean Social Values and Common Knowledge
Jiyoung Lee
Minwoo Kim
Seungho Kim
Junghwan Kim
Seunghyun Won
Hwaran Lee
Edward Choi
ALM
127
17
0
21 Feb 2024
MoRAL: MoE Augmented LoRA for LLMs' Lifelong Learning
Shu Yang
Muhammad Asif Ali
Cheng-Long Wang
Lijie Hu
Di Wang
CLL
MoE
114
46
0
17 Feb 2024
MoCa: Measuring Human-Language Model Alignment on Causal and Moral Judgment Tasks
Allen Nie
Yuhui Zhang
Atharva Amdekar
Chris Piech
Tatsunori Hashimoto
Tobias Gerstenberg
80
40
0
30 Oct 2023
Evaluating the Moral Beliefs Encoded in LLMs
Nino Scherrer
Claudia Shi
Amir Feder
David M. Blei
89
139
0
26 Jul 2023
Enhancing Chat Language Models by Scaling High-quality Instructional Conversations
Ning Ding
Yulin Chen
Bokai Xu
Yujia Qin
Zhi Zheng
Shengding Hu
Zhiyuan Liu
Maosong Sun
Bowen Zhou
ALM
150
554
0
23 May 2023
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Ameet Deshpande
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
LM&MA
LLMAG
81
371
0
11 Apr 2023
Generative Agents: Interactive Simulacra of Human Behavior
J. Park
Joseph C. O'Brien
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
LM&Ro
AI4CE
422
1,989
0
07 Apr 2023
Assessing Cross-Cultural Alignment between ChatGPT and Human Societies: An Empirical Study
Yong Cao
Li Zhou
Seolhwa Lee
Laura Cabello
Min Chen
Daniel Hershcovich
95
185
0
30 Mar 2023
G-Eval: NLG Evaluation using GPT-4 with Better Human Alignment
Yang Liu
Dan Iter
Yichong Xu
Shuohang Wang
Ruochen Xu
Chenguang Zhu
ELM
ALM
LM&MA
217
1,213
0
29 Mar 2023
Mass-Editing Memory in a Transformer
Kevin Meng
Arnab Sen Sharma
A. Andonian
Yonatan Belinkov
David Bau
KELM
VLM
154
599
0
13 Oct 2022
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
910
13,249
0
04 Mar 2022
Locating and Editing Factual Associations in GPT
Kevin Meng
David Bau
A. Andonian
Yonatan Belinkov
KELM
255
1,392
0
10 Feb 2022
Alignment of Language Agents
Zachary Kenton
Tom Everitt
Laura Weidinger
Iason Gabriel
Vladimir Mikulik
G. Irving
85
166
0
26 Mar 2021
Transformer Feed-Forward Layers Are Key-Value Memories
Mor Geva
R. Schuster
Jonathan Berant
Omer Levy
KELM
195
849
0
29 Dec 2020
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
847
132,963
0
12 Jun 2017
1