Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2310.05074
Cited By
DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models
8 October 2023
Chengcheng Han
Xiaowei Du
Che Zhang
Yixin Lian
Xiang Li
Ming Gao
Baoyuan Wang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DialCoT Meets PPO: Decomposing and Exploring Reasoning Paths in Smaller Language Models"
11 / 11 papers shown
Title
Concise Reasoning via Reinforcement Learning
Mehdi Fatemi
Banafsheh Rafiee
Mingjie Tang
Kartik Talamadupula
ReLM
OffRL
LRM
52
6
0
07 Apr 2025
Thinking Machines: A Survey of LLM based Reasoning Strategies
Dibyanayan Bandyopadhyay
Soham Bhattacharjee
Asif Ekbal
LRM
ELM
48
4
0
13 Mar 2025
Investigating Mysteries of CoT-Augmented Distillation
Somin Wadhwa
Silvio Amir
Byron C. Wallace
ReLM
LRM
27
8
0
20 Jun 2024
Aggregation of Reasoning: A Hierarchical Framework for Enhancing Answer Selection in Large Language Models
Zhangyue Yin
Qiushi Sun
Qipeng Guo
Zhiyuan Zeng
Xiaonan Li
...
Qinyuan Cheng
Ding Wang
Xiaofeng Mou
Xipeng Qiu
XuanJing Huang
LRM
46
4
0
21 May 2024
Learning to Check: Unleashing Potentials for Self-Correction in Large Language Models
Che Zhang
Zhenyang Xiao
Chengcheng Han
Yixin Lian
Yuejian Fang
LRM
27
0
0
20 Feb 2024
Navigate through Enigmatic Labyrinth A Survey of Chain of Thought Reasoning: Advances, Frontiers and Future
Zheng Chu
Jingchang Chen
Qianglong Chen
Weijiang Yu
Tao He
Haotian Wang
Weihua Peng
Ming-Yu Liu
Bing Qin
Ting Liu
LRM
AI4CE
31
151
0
27 Sep 2023
Compositional Semantic Parsing with Large Language Models
Andrew Drozdov
Nathanael Scharli
Ekin Akyuurek
Nathan Scales
Xinying Song
Xinyun Chen
Olivier Bousquet
Denny Zhou
ReLM
LRM
200
92
0
29 Sep 2022
Is a Question Decomposition Unit All We Need?
Pruthvi H. Patel
Swaroop Mishra
Mihir Parmar
Chitta Baral
ReLM
158
51
0
25 May 2022
Large Language Models are Zero-Shot Reasoners
Takeshi Kojima
S. Gu
Machel Reid
Yutaka Matsuo
Yusuke Iwasawa
ReLM
LRM
325
4,077
0
24 May 2022
Self-Consistency Improves Chain of Thought Reasoning in Language Models
Xuezhi Wang
Jason W. Wei
Dale Schuurmans
Quoc Le
Ed H. Chi
Sharan Narang
Aakanksha Chowdhery
Denny Zhou
ReLM
BDL
LRM
AI4CE
314
3,248
0
21 Mar 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
367
8,495
0
28 Jan 2022
1