Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2505.14999
Cited By
Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision
21 May 2025
Eric Hanchen Jiang
Haozheng Luo
Shengyuan Pang
Xiaomin Li
Zhenting Qi
Hengli Li
Cheng Yang
Zongyu Lin
Xinfeng Li
Hao Xu
Kai-Wei Chang
Ying Nian Wu
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Learning to Rank Chain-of-Thought: An Energy-Based Approach with Outcome Supervision"
16 / 16 papers shown
Title
Chain-of-Action: Faithful and Multimodal Question Answering through Large Language Models
Zhenyu Pan
Haozheng Luo
Manling Li
Han Liu
LRM
82
16
0
24 Feb 2025
QLASS: Boosting Language Agent Inference via Q-Guided Stepwise Search
Zongyu Lin
Yao Tang
Xingcheng Yao
Da Yin
Ziniu Hu
Ningyu Zhang
Kai-Wei Chang
LRM
92
5
0
04 Feb 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
251
1,503
0
22 Jan 2025
Qwen Technical Report
Jinze Bai
Shuai Bai
Yunfei Chu
Zeyu Cui
Kai Dang
...
Zhenru Zhang
Chang Zhou
Jingren Zhou
Xiaohuan Zhou
Tianhang Zhu
OSLM
163
1,756
0
28 Sep 2023
WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct
Haipeng Luo
Qingfeng Sun
Can Xu
Pu Zhao
Jian-Guang Lou
...
Xiubo Geng
Qingwei Lin
Shifeng Chen
Yansong Tang
Dongmei Zhang
LRM
OSLM
153
439
0
18 Aug 2023
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
255
4,186
0
09 Jun 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
156
1,583
0
15 Dec 2022
Automatic Chain of Thought Prompting in Large Language Models
Zhuosheng Zhang
Aston Zhang
Mu Li
Alexander J. Smola
ReLM
LRM
113
610
0
07 Oct 2022
Measuring and Narrowing the Compositionality Gap in Language Models
Ofir Press
Muru Zhang
Sewon Min
Ludwig Schmidt
Noah A. Smith
M. Lewis
ReLM
KELM
LRM
113
595
0
07 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
596
9,009
0
28 Jan 2022
Energy-based Out-of-distribution Detection
Weitang Liu
Xiaoyun Wang
John Douglas Owens
Yixuan Li
OODD
215
1,332
0
08 Oct 2020
Your Classifier is Secretly an Energy Based Model and You Should Treat it Like One
Will Grathwohl
Kuan-Chieh Wang
J. Jacobsen
David Duvenaud
Mohammad Norouzi
Kevin Swersky
VLM
74
536
0
06 Dec 2019
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
429
1,664
0
18 Sep 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.1K
93,936
0
11 Oct 2018
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
245
18,685
0
20 Jul 2017
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
499
129,831
0
12 Jun 2017
1