ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.22304
  4. Cited By
Flow-DPO: Improving LLM Mathematical Reasoning through Online
  Multi-Agent Learning

Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning

29 October 2024
Yihe Deng
Paul Mineiro
    LRM
ArXiv (abs)PDFHTML

Papers citing "Flow-DPO: Improving LLM Mathematical Reasoning through Online Multi-Agent Learning"

12 / 12 papers shown
Title
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Towards Scientific Intelligence: A Survey of LLM-based Scientific Agents
Shuo Ren
Pu Jian
Zhenjiang Ren
Chunlin Leng
Can Xie
Jiajun Zhang
LLMAGAI4CE
140
4
0
31 Mar 2025
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large
  Language Models
Omni-MATH: A Universal Olympiad Level Mathematic Benchmark For Large Language Models
Bofei Gao
Feifan Song
Zhiyong Yang
Zefan Cai
Yibo Miao
...
Lei Sha
Yichang Zhang
Xuancheng Ren
Tianyu Liu
Baobao Chang
ELMLRM
88
66
0
10 Oct 2024
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of
  LLMs
Step-DPO: Step-wise Preference Optimization for Long-chain Reasoning of LLMs
Xin Lai
Zhuotao Tian
Yukang Chen
Senqiao Yang
Xiangru Peng
Jiaya Jia
LRM
153
126
0
26 Jun 2024
Online Joint Fine-tuning of Multi-Agent Flows
Online Joint Fine-tuning of Multi-Agent Flows
Paul Mineiro
60
2
0
06 Jun 2024
Iterative Reasoning Preference Optimization
Iterative Reasoning Preference Optimization
Richard Yuanzhe Pang
Weizhe Yuan
Kyunghyun Cho
He He
Sainbayar Sukhbaatar
Jason Weston
LRM
127
137
0
30 Apr 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAGReLMLRM
125
151
0
14 Mar 2024
V-STaR: Training Verifiers for Self-Taught Reasoners
V-STaR: Training Verifiers for Self-Taught Reasoners
Arian Hosseini
Xingdi Yuan
Nikolay Malkin
Rameswar Panda
Alessandro Sordoni
Rishabh Agarwal
ReLMLRM
93
137
0
09 Feb 2024
MathVista: Evaluating Mathematical Reasoning of Foundation Models in
  Visual Contexts
MathVista: Evaluating Mathematical Reasoning of Foundation Models in Visual Contexts
Pan Lu
Hritik Bansal
Tony Xia
Jiacheng Liu
Chun-yue Li
Hannaneh Hajishirzi
Hao Cheng
Kai-Wei Chang
Michel Galley
Jianfeng Gao
LRMMLLM
126
665
0
03 Oct 2023
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language
  Models
MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models
L. Yu
Weisen Jiang
Han Shi
Jincheng Yu
Zhengying Liu
Yu Zhang
James T. Kwok
Zheng Li
Adrian Weller
Weiyang Liu
OSLMLRM
106
395
0
21 Sep 2023
Let's Verify Step by Step
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALMOffRLLRM
198
1,240
0
31 May 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,163
0
29 May 2023
Training Verifiers to Solve Math Word Problems
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLMOffRLLRM
350
4,596
0
27 Oct 2021
1