ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.15028
  4. Cited By
Don't throw away your value model! Generating more preferable text with
  Value-Guided Monte-Carlo Tree Search decoding

Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding

26 September 2023
Jiacheng Liu
Andrew Cohen
Ramakanth Pasunuru
Yejin Choi
Hannaneh Hajishirzi
Asli Celikyilmaz
ArXivPDFHTML

Papers citing "Don't throw away your value model! Generating more preferable text with Value-Guided Monte-Carlo Tree Search decoding"

18 / 18 papers shown
Title
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
VisuoThink: Empowering LVLM Reasoning with Multimodal Tree Search
Yikun Wang
Siyin Wang
Qinyuan Cheng
Zhaoye Fei
Liang Ding
Qipeng Guo
Dacheng Tao
Xipeng Qiu
LRM
27
0
0
12 Apr 2025
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
A Survey on Personalized and Pluralistic Preference Alignment in Large Language Models
Zhouhang Xie
Junda Wu
Yiran Shen
Yu Xia
Xintong Li
...
Sachin Kumar
Bodhisattwa Prasad Majumder
Jingbo Shang
Prithviraj Ammanabrolu
Julian McAuley
42
0
0
09 Apr 2025
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Sample, Don't Search: Rethinking Test-Time Alignment for Language Models
Gonçalo Faria
Noah A. Smith
34
0
0
04 Apr 2025
Entropy-Aware Branching for Improved Mathematical Reasoning
Entropy-Aware Branching for Improved Mathematical Reasoning
Xianzhi Li
Ethan Callanan
Xiaodan Zhu
Mathieu Sibue
Antony Papadimitriou
Mahmoud Mahfouz
Zhiqiang Ma
Xiaomo Liu
LRM
42
0
0
27 Mar 2025
Language Models can Self-Improve at State-Value Estimation for Better Search
Ethan Mendes
Alan Ritter
LRM
62
3
0
04 Mar 2025
Streaming Looking Ahead with Token-level Self-reward
H. Zhang
Ruixin Hong
Dong Yu
44
1
0
24 Feb 2025
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Teaching LLMs According to Their Aptitude: Adaptive Reasoning for Mathematical Problem Solving
Xin Xu
Yan Xu
Tianhao Chen
Yuchen Yan
Chengwu Liu
...
Yansen Wang
Yichun Yin
Yijiao Wang
Lifeng Shang
Qiang Liu
LRM
75
2
0
17 Feb 2025
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Investigating Inference-time Scaling for Chain of Multi-modal Thought: A Preliminary Study
Yujie Lin
Ante Wang
Moye Chen
Jingyao Liu
Hao Liu
Jinsong Su
Xinyan Xiao
LRM
50
2
0
17 Feb 2025
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
MathFimer: Enhancing Mathematical Reasoning by Expanding Reasoning Steps through Fill-in-the-Middle Task
Yuchen Yan
Yongliang Shen
Yang Liu
Jin Jiang
Xin Xu
Hao Fei
Jian Shao
Yueting Zhuang
ReLM
LRM
53
2
0
17 Feb 2025
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
ReasonFlux: Hierarchical LLM Reasoning via Scaling Thought Templates
L. Yang
Zhaochen Yu
Bin Cui
Mengdi Wang
ReLM
LRM
AI4CE
98
10
0
10 Feb 2025
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
COS(M+O)S: Curiosity and RL-Enhanced MCTS for Exploring Story Space via Language Models
Tobias Materzok
LRM
69
0
0
28 Jan 2025
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search
  and Best-of-N Sampling
TreeBoN: Enhancing Inference-Time Alignment with Speculative Tree-Search and Best-of-N Sampling
Jiahao Qiu
Yifu Lu
Yifan Zeng
Jiacheng Guo
Jiayi Geng
Huazheng Wang
Kaixuan Huang
Yue Wu
Mengdi Wang
42
22
0
18 Oct 2024
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Learning How Hard to Think: Input-Adaptive Allocation of LM Computation
Mehul Damani
Idan Shenfeld
Andi Peng
Andreea Bobu
Jacob Andreas
39
16
0
07 Oct 2024
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level
  Mathematical Reasoning
LLaMA-Berry: Pairwise Optimization for O1-like Olympiad-Level Mathematical Reasoning
Di Zhang
Jianbo Wu
Jingdi Lei
Tong Che
Jiatong Li
...
Shufei Zhang
Marco Pavone
Yuqiang Li
Wanli Ouyang
Dongzhan Zhou
LRM
35
43
0
03 Oct 2024
On the Transformations across Reward Model, Parameter Update, and
  In-Context Prompt
On the Transformations across Reward Model, Parameter Update, and In-Context Prompt
Deng Cai
Huayang Li
Tingchen Fu
Siheng Li
Weiwen Xu
...
Leyang Cui
Yan Wang
Lemao Liu
Taro Watanabe
Shuming Shi
KELM
30
2
0
24 Jun 2024
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level
  Hallucination Detection
KCTS: Knowledge-Constrained Tree Search Decoding with Token-Level Hallucination Detection
Sehyun Choi
Tianqing Fang
Zhaowei Wang
Yangqiu Song
35
32
0
13 Oct 2023
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
333
11,953
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
292
1,595
0
18 Sep 2019
1