Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2406.18629
Cited By
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
26 June 2024
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs"
50 / 72 papers shown
Title
Shadow-FT: Tuning Instruct via Base
Taiqiang Wu
Runming Yang
Jiayi Li
Pengfei Hu
Ngai Wong
Yujiu Yang
7
0
0
19 May 2025
Beyond Theorem Proving: Formulation, Framework and Benchmark for Formal Problem-Solving
Qi Liu
Xinhao Zheng
Renqiu Xia
Xingzhi Qi
Qinxiang Cao
Junchi Yan
AIMat
52
0
0
07 May 2025
RM-R1: Reward Modeling as Reasoning
Xiusi Chen
Gaotang Li
Zehua Wang
Bowen Jin
Cheng Qian
...
Y. Zhang
D. Zhang
Tong Zhang
Hanghang Tong
Heng Ji
ReLM
OffRL
LRM
171
1
0
05 May 2025
A Survey on Progress in LLM Alignment from the Perspective of Reward Design
Miaomiao Ji
Yanqiu Wu
Zhibin Wu
Shoujin Wang
Jian Yang
Mark Dras
Usman Naseem
39
0
0
05 May 2025
Sailing AI by the Stars: A Survey of Learning from Rewards in Post-Training and Test-Time Scaling of Large Language Models
Xiaobao Wu
LRM
72
1
0
05 May 2025
DeepCritic: Deliberate Critique with Large Language Models
Wenkai Yang
Jingwen Chen
Yankai Lin
Ji-Rong Wen
ALM
LRM
30
0
0
01 May 2025
Iterative Tool Usage Exploration for Multimodal Agents via Step-wise Preference Tuning
Pengxiang Li
Zhi Gao
Bofei Zhang
Yapeng Mi
Xiaojian Ma
...
Tao Yuan
Yuwei Wu
Yunde Jia
Song-Chun Zhu
Qing Li
LLMAG
72
0
0
30 Apr 2025
Beyond the Last Answer: Your Reasoning Trace Uncovers More than You Think
Hasan Hammoud
Hani Itani
Guohao Li
ReLM
LRM
80
1
0
29 Apr 2025
Unsupervised Visual Chain-of-Thought Reasoning via Preference Optimization
Kesen Zhao
B. Zhu
Qianru Sun
Hanwang Zhang
MLLM
LRM
86
0
0
25 Apr 2025
A Framework for Benchmarking and Aligning Task-Planning Safety in LLM-Based Embodied Agents
Yuting Huang
Leilei Ding
ZhiPeng Tang
Tianfu Wang
Xinrui Lin
Weinan Zhang
Mingxiao Ma
Yanyong Zhang
LLMAG
40
0
0
20 Apr 2025
Exploring Expert Failures Improves LLM Agent Tuning
Li-Cheng Lan
Andrew Bai
Minhao Cheng
Ruochen Wang
Cho-Jui Hsieh
LRM
165
0
0
17 Apr 2025
ReTool: Reinforcement Learning for Strategic Tool Use in LLMs
Jiazhan Feng
Shijue Huang
Xingwei Qu
Ge Zhang
Yujia Qin
Baoquan Zhong
Chengquan Jiang
Jinxin Chi
Wanjun Zhong
OffRL
ReLM
SyDa
KELM
LRM
56
7
0
15 Apr 2025
REWARD CONSISTENCY: Improving Multi-Objective Alignment from a Data-Centric Perspective
Zhihao Xu
Yongqi Tong
Xin Zhang
Jun Zhou
Xiting Wang
37
0
0
15 Apr 2025
FingER: Content Aware Fine-grained Evaluation with Reasoning for AI-Generated Videos
Rui Chen
Lei Sun
Jing Tang
Geng Li
Xiangxiang Chu
LRM
29
0
0
14 Apr 2025
SaRO: Enhancing LLM Safety through Reasoning-based Alignment
Yutao Mou
Yuxiao Luo
Shikun Zhang
Wei Ye
LLMSV
LRM
36
0
0
13 Apr 2025
SoTA with Less: MCTS-Guided Sample Selection for Data-Efficient Visual Reasoning Self-Improvement
Qing Guo
Zhengyuan Yang
Chao Feng
Hongjin Lu
Linjie Li
Chung-Ching Lin
Kevin Qinghong Lin
Furong Huang
Lijuan Wang
OODD
ReLM
VLM
LRM
69
1
0
10 Apr 2025
Supervised Optimism Correction: Be Confident When LLMs Are Sure
Jianwei Zhang
Rushuai Yang
Shunyu Liu
Ting-En Lin
Fei Huang
Yi Chen
Yongqian Li
Dacheng Tao
OffRL
24
0
0
10 Apr 2025
LearNAT: Learning NL2SQL with AST-guided Task Decomposition for Large Language Models
Weibin Liao
Xin Gao
Tianyu Jia
Rihong Qiu
Yifan Zhu
Yang Lin
Xu Chu
Junfeng Zhao
Yasha Wang
35
0
0
03 Apr 2025
AdPO: Enhancing the Adversarial Robustness of Large Vision-Language Models with Preference Optimization
Chaohu Liu
Tianyi Gui
Yu Liu
Linli Xu
VLM
AAML
68
1
0
02 Apr 2025
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAG
AI4CE
61
1
0
31 Mar 2025
CoRanking: Collaborative Ranking with Small and Large Ranking Agents
Wenhan Liu
Xinyu Ma
Yichen Zhu
Lixin Su
S. Wang
Dawei Yin
Zhicheng Dou
ALM
44
0
0
30 Mar 2025
Controlling Large Language Model with Latent Actions
Chengxing Jia
Ziniu Li
Pengyuan Wang
Yi-Chen Li
Zhenyu Hou
Yuxiao Dong
Y. Yu
56
0
0
27 Mar 2025
Boosting Virtual Agent Learning and Reasoning: A Step-wise, Multi-dimensional, and Generalist Reward Model with Benchmark
Bingchen Miao
Y. Wu
Minghe Gao
Qifan Yu
Wendong Bu
Wenqiao Zhang
Yunfei Li
Siliang Tang
Tat-Seng Chua
Juncheng Billy Li
LLMAG
LRM
58
0
0
24 Mar 2025
Teaching LLMs for Step-Level Automatic Math Correction via Reinforcement Learning
J. Li
Jie Zhou
Yutao Yang
Bihao Zhan
Qianjun Pan
Yuyang Ding
Qin Chen
Jiang Bo
Xin Lin
Liang He
LRM
59
0
0
24 Mar 2025
A Survey on Mathematical Reasoning and Optimization with Large Language Models
Ali Forootani
OffRL
LRM
AI4CE
45
0
0
22 Mar 2025
MedAgent-Pro: Towards Multi-modal Evidence-based Medical Diagnosis via Reasoning Agentic Workflow
Ziyue Wang
Junde Wu
Chang Han Low
Yueming Jin
LRM
57
2
0
21 Mar 2025
VisualPRM: An Effective Process Reward Model for Multimodal Reasoning
Weiyun Wang
Zhangwei Gao
L. Chen
Zhe Chen
Jinguo Zhu
...
Lewei Lu
Haodong Duan
Yu Qiao
Jifeng Dai
Wenhai Wang
LRM
65
11
0
13 Mar 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
48
5
0
13 Mar 2025
Local Look-Ahead Guidance via Verifier-in-the-Loop for Automated Theorem Proving
Sara Rajaee
Kumar Pratik
Gabriele Cesa
Arash Behboodi
OffRL
LRM
61
0
0
12 Mar 2025
VisRL: Intention-Driven Visual Perception via Reinforced Reasoning
Zhangquan Chen
Xufang Luo
Dongsheng Li
OffRL
LRM
72
3
0
10 Mar 2025
Vision-R1: Incentivizing Reasoning Capability in Multimodal Large Language Models
Wenxuan Huang
Bohan Jia
Zijie Zhai
Shaosheng Cao
Zheyu Ye
Fei Zhao
Zhe Xu
Yao Hu
Shaohui Lin
MU
OffRL
LRM
MLLM
ReLM
VLM
59
45
0
09 Mar 2025
Process-based Self-Rewarding Language Models
Shimao Zhang
Xiao Liu
Xin Zhang
Junxiao Liu
Zheheng Luo
Shujian Huang
Yeyun Gong
ReLM
SyDa
LRM
95
2
0
05 Mar 2025
IterPref: Focal Preference Learning for Code Generation via Iterative Debugging
Jie Wu
Haoling Li
Xin Zhang
Jianwen Luo
Yangyu Huang
Ruihang Chu
Yi Yang
Scarlett Li
75
0
0
04 Mar 2025
Self-Evolved Preference Optimization for Enhancing Mathematical Reasoning in Small Language Models
Joykirat Singh
Tanmoy Chakraborty
A. Nambi
AI4Cl
LRM
ReLM
60
1
0
04 Mar 2025
Generate, Discriminate, Evolve: Enhancing Context Faithfulness via Fine-Grained Sentence-Level Self-Evolution
Keliang Li
Tianhua Zhang
Yunxiang Li
Hongyin Luo
Abdalla Moustafa
Xixin Wu
James Glass
Helen Meng
66
0
0
03 Mar 2025
Self-Training Elicits Concise Reasoning in Large Language Models
Tergel Munkhbat
Namgyu Ho
S. Kim
Yongjin Yang
Yujin Kim
Se-Young Yun
ReLM
LRM
64
12
0
27 Feb 2025
Self-rewarding correction for mathematical reasoning
Wei Xiong
Hanning Zhang
Chenlu Ye
Lichang Chen
Nan Jiang
Tong Zhang
ReLM
KELM
LRM
75
9
0
26 Feb 2025
Two Heads Are Better Than One: Dual-Model Verbal Reflection at Inference-Time
Jiazheng Li
Yuxiang Zhou
Junru Lu
Gladys Tyen
Lin Gui
Cesare Aloisi
Yulan He
LRM
39
2
0
26 Feb 2025
CuDIP: Enhancing Theorem Proving in LLMs via Curriculum Learning-based Direct Preference Optimization
Shuming Shi
Ruobing Zuo
Gaolei He
Jianlin Wang
Chenyang Xu
Zhengfeng Yang
65
0
0
25 Feb 2025
Towards Thinking-Optimal Scaling of Test-Time Compute for LLM Reasoning
Wenkai Yang
Shuming Ma
Yankai Lin
Furu Wei
LRM
53
24
0
25 Feb 2025
Multimodal Large Language Models for Text-rich Image Understanding: A Comprehensive Review
Pei Fu
Tongkun Guan
Zining Wang
Zhentao Guo
Chen Duan
...
Boming Chen
Jiayao Ma
Qianyi Jiang
Kai Zhou
Junfeng Luo
VLM
62
0
0
23 Feb 2025
A Survey on Feedback-based Multi-step Reasoning for Large Language Models on Mathematics
Ting-Ruen Wei
Haowei Liu
Xuyang Wu
Yi Fang
LRM
AI4CE
ReLM
KELM
208
1
0
21 Feb 2025
Multi-Step Alignment as Markov Games: An Optimistic Online Gradient Descent Approach with Convergence Guarantees
Yongtao Wu
Luca Viano
Yihang Chen
Zhenyu Zhu
Kimon Antonakopoulos
Quanquan Gu
V. Cevher
54
0
0
18 Feb 2025
Preference Optimization for Reasoning with Pseudo Feedback
Fangkai Jiao
Geyang Guo
Xingxing Zhang
Nancy F. Chen
Shafiq R. Joty
Furu Wei
LRM
99
9
0
17 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Tengjiao Wang
Mengdi Wang
ReLM
LRM
AI4CE
98
10
0
10 Feb 2025
PIPA: Preference Alignment as Prior-Informed Statistical Estimation
Junbo Li
Zhangyang Wang
Qiang Liu
OffRL
104
0
0
09 Feb 2025
Think Smarter not Harder: Adaptive Reasoning with Inference Aware Optimization
Zishun Yu
Tengyu Xu
Di Jin
Karthik Abinav Sankararaman
Yun He
...
Eryk Helenowski
Chen Zhu
Sinong Wang
Hao Ma
Han Fang
LRM
54
4
0
29 Jan 2025
VidChain: Chain-of-Tasks with Metric-based Direct Preference Optimization for Dense Video Captioning
Ji Soo Lee
Jongha Kim
Jeehye Na
Jinyoung Park
H. Kim
VGen
40
0
0
12 Jan 2025
Enhancing Reasoning through Process Supervision with Monte Carlo Tree Search
Shuangtao Li
Shuaihao Dong
Kexin Luan
Xinhan Di
Chaofan Ding
LRM
43
2
0
02 Jan 2025
Mars-PO: Multi-Agent Reasoning System Preference Optimization
Xiaoxuan Lou
Chaojie Wang
Bo An
LLMAG
LRM
72
0
0
28 Nov 2024
1
2
Next