AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning

18 June 2025

Tevin Wang

Chenyan Xiong

ArXiv (abs)PDF HTML

Papers citing "AutoRule: Reasoning Chain-of-thought Extracted Rule-based Rewards Improve Preference Learning"

Title
No papers