Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2501.10799
Cited By
Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback
18 January 2025
Yen-Ting Lin
Di Jin
Tengyu Xu
Tianhao Wu
Sainbayar Sukhbaatar
Chen Zhu
Yun He
Yun-Nung Chen
Jason Weston
Yuandong Tian
Arash Rahnama
Sinong Wang
Hao Ma
Han Fang
LRM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Step-KTO: Optimizing Mathematical Reasoning through Stepwise Binary Feedback"
2 / 2 papers shown
Title
SWEET-RL: Training Multi-Turn LLM Agents on Collaborative Reasoning Tasks
Yifei Zhou
Song Jiang
Yuandong Tian
Jason Weston
Sergey Levine
Sainbayar Sukhbaatar
Xian Li
LLMAG
LRM
139
15
0
19 Mar 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
Junbo Li
Zhangyang Wang
Qiang Liu
OffRL
188
0
0
09 Feb 2025
1