Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2402.01920
Cited By
Preference Poisoning Attacks on Reward Model Learning
2 February 2024
Junlin Wu
Jiong Wang
Chaowei Xiao
Chenguang Wang
Ning Zhang
Yevgeniy Vorobeychik
AAML
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Preference Poisoning Attacks on Reward Model Learning"
4 / 4 papers shown
Title
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
151
0
0
18 Oct 2024
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Nicholas Carlini
AAML
149
68
0
04 May 2021
Backdooring and Poisoning Neural Networks with Image-Scaling Attacks
Erwin Quiring
Konrad Rieck
AAML
51
70
0
19 Mar 2020
1