ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2309.08589
  4. Cited By
Chain-of-Thought Reasoning is a Policy Improvement Operator

Chain-of-Thought Reasoning is a Policy Improvement Operator

15 September 2023
Hugh Zhang
David C. Parkes
    ReLM
    LM&Ro
    LRM
ArXivPDFHTML

Papers citing "Chain-of-Thought Reasoning is a Policy Improvement Operator"

13 / 13 papers shown
Title
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
Ignore the KL Penalty! Boosting Exploration on Critical Tokens to Enhance RL Fine-Tuning
Jean Vassoyan
Nathanaël Beau
Roman Plaud
OffRL
106
1
0
10 Feb 2025
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Self-Improving Transformers Overcome Easy-to-Hard and Length Generalization Challenges
Nayoung Lee
Ziyang Cai
Avi Schwarzschild
Kangwook Lee
Dimitris Papailiopoulos
ReLM
VLM
LRM
AI4CE
83
4
0
03 Feb 2025
Argumentative Large Language Models for Explainable and Contestable Claim Verification
Argumentative Large Language Models for Explainable and Contestable Claim Verification
Gabriel Freedman
Adam Dejl
Deniz Gorur
Xiang Yin
Antonio Rago
Francesca Toni
32
7
0
03 May 2024
Quiet-STaR: Language Models Can Teach Themselves to Think Before
  Speaking
Quiet-STaR: Language Models Can Teach Themselves to Think Before Speaking
E. Zelikman
Georges Harik
Yijia Shao
Varuna Jayasiri
Nick Haber
Noah D. Goodman
LLMAG
ReLM
LRM
55
113
0
14 Mar 2024
Unleashing the potential of prompt engineering in Large Language Models:
  a comprehensive review
Unleashing the potential of prompt engineering in Large Language Models: a comprehensive review
Banghao Chen
Zhaofeng Zhang
Nicolas Langrené
Shengxin Zhu
LLMAG
31
89
0
23 Oct 2023
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought
  Reasoning: Advances, Frontiers and Future
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yu Liu
Bing Qin
Ting Liu
LRM
AI4CE
37
153
0
27 Sep 2023
How Language Model Hallucinations Can Snowball
How Language Model Hallucinations Can Snowball
Muru Zhang
Ofir Press
William Merrill
Alisa Liu
Noah A. Smith
HILM
LRM
88
257
0
22 May 2023
Will we run out of data? Limits of LLM scaling based on human-generated
  data
Will we run out of data? Limits of LLM scaling based on human-generated data
Pablo Villalobos
A. Ho
J. Sevilla
T. Besiroglu
Lennart Heim
Marius Hobbhahn
ALM
38
111
0
26 Oct 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,273
0
21 Mar 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
339
12,003
0
04 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
Scaling Laws for Neural Language Models
Scaling Laws for Neural Language Models
Jared Kaplan
Sam McCandlish
T. Henighan
Tom B. Brown
B. Chess
R. Child
Scott Gray
Alec Radford
Jeff Wu
Dario Amodei
264
4,489
0
23 Jan 2020
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
298
1,610
0
18 Sep 2019
1