Preference Poisoning Attacks on Reward Model Learning

2 February 2024

Papers citing "Preference Poisoning Attacks on Reward Model Learning"

4 / 4 papers shown

Title
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment Qin Liu Fei Wang Chaowei Xiao Muhao Chen 151 0 0 18 Oct 2024
Poisoning Language Models During Instruction Tuning Alexander Wan Eric Wallace Sheng Shen Dan Klein SILM 92 124 0 01 May 2023
Poisoning the Unlabeled Dataset of Semi-Supervised Learning Nicholas Carlini AAML 149 68 0 04 May 2021
Backdooring and Poisoning Neural Networks with Image-Scaling Attacks Erwin Quiring Konrad Rieck AAML 51 70 0 19 Mar 2020