From Demonstrations to Rewards: Alignment Without Explicit Human Preferences

15 March 2025

Papers citing "From Demonstrations to Rewards: Alignment Without Explicit Human Preferences"

1 / 1 papers shown

Title
DMRL: Data- and Model-aware Reward Learning for Data Extraction Zhiqiang Wang Ruoxi Cheng 31 0 0 07 May 2025