ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2402.01920
  4. Cited By
Preference Poisoning Attacks on Reward Model Learning

Preference Poisoning Attacks on Reward Model Learning

2 February 2024
Junlin Wu
Jiong Wang
Chaowei Xiao
Chenguang Wang
Ning Zhang
Yevgeniy Vorobeychik
    AAML
ArXivPDFHTML

Papers citing "Preference Poisoning Attacks on Reward Model Learning"

4 / 4 papers shown
Title
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
SudoLM: Learning Access Control of Parametric Knowledge with Authorization Alignment
Qin Liu
Fei Wang
Chaowei Xiao
Muhao Chen
151
0
0
18 Oct 2024
Poisoning Language Models During Instruction Tuning
Poisoning Language Models During Instruction Tuning
Alexander Wan
Eric Wallace
Sheng Shen
Dan Klein
SILM
92
124
0
01 May 2023
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Poisoning the Unlabeled Dataset of Semi-Supervised Learning
Nicholas Carlini
AAML
149
68
0
04 May 2021
Backdooring and Poisoning Neural Networks with Image-Scaling Attacks
Backdooring and Poisoning Neural Networks with Image-Scaling Attacks
Erwin Quiring
Konrad Rieck
AAML
51
70
0
19 Mar 2020
1