ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2412.14135
  4. Cited By
Scaling of Search and Learning: A Roadmap to Reproduce o1 from
  Reinforcement Learning Perspective

Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective

18 December 2024
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Bo Wang
Shimin Li
Yunhua Zhou
Qipeng Guo
Xuanjing Huang
Xipeng Qiu
    ELMAI4TSLRM
ArXiv (abs)PDFHTML

Papers citing "Scaling of Search and Learning: A Roadmap to Reproduce o1 from Reinforcement Learning Perspective"

13 / 13 papers shown
Title
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
Direct Reasoning Optimization: LLMs Can Reward And Refine Their Own Reasoning for Open-Ended Tasks
Yifei Xu
Tusher Chakraborty
Srinagesh Sharma
Leonardo Nunes
Emre Kıcıman
Songwu Lu
Ranveer Chandra
OffRLLRM
69
1
0
16 Jun 2025
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence
TimeHC-RL: Temporal-aware Hierarchical Cognitive Reinforcement Learning for Enhancing LLMs' Social Intelligence
Guiyang Hou
Xing Gao
Yuchuan Wu
Xiang Huang
Wenqi Zhang
...
Yongliang Shen
Jialu Du
Fei Huang
Yongbin Li
Weiming Lu
53
0
0
30 May 2025
Pretraining Language Models to Ponder in Continuous Space
Pretraining Language Models to Ponder in Continuous Space
Boyi Zeng
Shixiang Song
Siyuan Huang
Yixuan Wang
He Li
Ziwei He
Xinbing Wang
Zhiyu Li
Zhouhan Lin
LRM
100
0
0
27 May 2025
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
Beyond Distillation: Pushing the Limits of Medical LLM Reasoning with Minimalist Rule-Based RL
Che Liu
Haozhe Wang
J. Pan
Zhongwei Wan
Yong Dai
Fangzhen Lin
Wenjia Bai
Daniel Rueckert
Rossella Arcucci
OffRLLRMELM
118
1
0
23 May 2025
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
J4R: Learning to Judge with Equivalent Initial State Group Relative Policy Optimization
Austin Xu
Yilun Zhou
Xuan-Phi Nguyen
Caiming Xiong
Shafiq Joty
ELMLRM
161
0
0
19 May 2025
MARFT: Multi-Agent Reinforcement Fine-Tuning
MARFT: Multi-Agent Reinforcement Fine-Tuning
Junwei Liao
Muning Wen
Jun Wang
Weinan Zhang
OffRL
167
5
0
21 Apr 2025
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Two Heads are Better Than One: Test-time Scaling of Multi-agent Collaborative Reasoning
Can Jin
Hongwu Peng
Qixin Zhang
Yujin Tang
Dimitris N. Metaxas
Tong Che
LLMAGLRM
439
9
0
14 Apr 2025
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
InftyThink: Breaking the Length Limits of Long-Context Reasoning in Large Language Models
Yuchen Yan
Yongliang Shen
Yuhang Liu
Jin Jiang
Hao Fei
Jian Shao
Yueting Zhuang
LRMReLM
146
10
0
09 Mar 2025
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
DAST: Difficulty-Adaptive Slow-Thinking for Large Reasoning Models
Yi Shen
Jing Zhang
Jieyun Huang
Shuming Shi
Wenjing Zhang
Jiangze Yan
Rongjia Du
Ning Wang
Kai Wang
Shiguo Lian
LRM
141
54
0
06 Mar 2025
RLTHF: Targeted Human Feedback for LLM Alignment
RLTHF: Targeted Human Feedback for LLM Alignment
Yifei Xu
Tusher Chakraborty
Emre Kıcıman
Bibek Aryal
Eduardo Rodrigues
...
Rafael Padilha
Leonardo Nunes
Shobana Balakrishnan
Songwu Lu
Ranveer Chandra
172
2
0
24 Feb 2025
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling
DISC: DISC: Dynamic Decomposition Improves LLM Inference Scaling
Jonathan Light
Wei Cheng
Benjamin Rivière
Wu Yue
Masafumi Oyamada
Mengdi Wang
Yisong Yue
Santiago Paternain
Haifeng Chen
ReLMLRM
129
2
0
23 Feb 2025
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Revisiting the Test-Time Scaling of o1-like Models: Do they Truly Possess Test-Time Scaling Capabilities?
Zhiyuan Zeng
Qinyuan Cheng
Zhangyue Yin
Yunhua Zhou
Xipeng Qiu
LRM
183
20
0
17 Feb 2025
Iterative Deepening Sampling as Efficient Test-Time Scaling
Iterative Deepening Sampling as Efficient Test-Time Scaling
Weizhe Chen
Sven Koenig
B. Dilkina
LRMReLM
156
1
0
08 Feb 2025
1