Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2504.09802
Cited By
Training Small Reasoning LLMs with Cognitive Preference Alignment
14 April 2025
Wenrui Cai
Chengyu Wang
Junbing Yan
Jun Huang
Xiangzhong Fang
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Training Small Reasoning LLMs with Cognitive Preference Alignment"
11 / 11 papers shown
Title
EasyDistill: A Comprehensive Toolkit for Effective Knowledge Distillation of Large Language Models
Chengyu Wang
Junbing Yan
Wenrui Cai
Yuanhao Yue
Jun Huang
VLM
19
0
0
27 May 2025
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
DeepSeek-AI
Daya Guo
Dejian Yang
Haowei Zhang
Junxiao Song
...
Shiyu Wang
S. Yu
Shunfeng Zhou
Shuting Pan
S.S. Li
ReLM
VLM
OffRL
AI4TS
LRM
318
1,641
0
22 Jan 2025
What is the Role of Small Models in the LLM Era: A Survey
Lihu Chen
Gaël Varoquaux
ALM
161
29
0
10 Sep 2024
Self-Play Preference Optimization for Language Model Alignment
Yue Wu
Zhiqing Sun
Huizhuo Yuan
Kaixuan Ji
Yiming Yang
Quanquan Gu
80
137
0
01 May 2024
Advancing LLM Reasoning Generalists with Preference Trees
Lifan Yuan
Ganqu Cui
Hanbin Wang
Ning Ding
Xingyao Wang
...
Zhenghao Liu
Bowen Zhou
Hao Peng
Zhiyuan Liu
Maosong Sun
LRM
110
117
0
02 Apr 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
238
532
0
02 Feb 2024
Let's Verify Step by Step
Hunter Lightman
V. Kosaraju
Yura Burda
Harrison Edwards
Bowen Baker
Teddy Lee
Jan Leike
John Schulman
Ilya Sutskever
K. Cobbe
ALM
OffRL
LRM
141
1,140
0
31 May 2023
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
780
12,893
0
04 Mar 2022
Training Verifiers to Solve Math Word Problems
K. Cobbe
V. Kosaraju
Mohammad Bavarian
Mark Chen
Heewoo Jun
...
Jerry Tworek
Jacob Hilton
Reiichiro Nakano
Christopher Hesse
John Schulman
ReLM
OffRL
LRM
227
4,392
0
27 Oct 2021
NeuralLog: Natural Language Inference with Joint Neural and Logical Reasoning
Zeming Chen
Qiyue Gao
Lawrence S. Moss
FedML
NAI
41
42
0
29 May 2021
Unsupervised Commonsense Question Answering with Self-Talk
Vered Shwartz
Peter West
Ronan Le Bras
Chandra Bhagavatula
Yejin Choi
ReLM
SSL
AI4MH
LRM
61
262
0
11 Apr 2020
1