Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2404.02078
Cited By
Advancing LLM Reasoning Generalists with Preference Trees
2 April 2024
Lifan Yuan
Ganqu Cui
Hanbin Wang
Ning Ding
Xingyao Wang
Jia Deng
Boji Shan
Huimin Chen
Ruobing Xie
Yankai Lin
Zhenghao Liu
Bowen Zhou
Hao Peng
Zhiyuan Liu
Maosong Sun
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Advancing LLM Reasoning Generalists with Preference Trees"
33 / 83 papers shown
Title
NVLM: Open Frontier-Class Multimodal LLMs
Wenliang Dai
Nayeon Lee
Wei Ping
Zhuoling Yang
Zihan Liu
Jon Barker
Tuomas Rintamaki
M. Shoeybi
Bryan Catanzaro
Ming-Yu Liu
MLLM
VLM
LRM
45
54
0
17 Sep 2024
Semi-Supervised Reward Modeling via Iterative Self-Training
Yifei He
Haoxiang Wang
Ziyan Jiang
Alexandros Papangelis
Han Zhao
OffRL
44
2
0
10 Sep 2024
Critique-out-Loud Reward Models
Zachary Ankner
Mansheej Paul
Brandon Cui
Jonathan D. Chang
Prithviraj Ammanabrolu
ALM
LRM
43
30
0
21 Aug 2024
EAGLE: Elevating Geometric Reasoning through LLM-empowered Visual Instruction Tuning
Zhihao Li
Yao Du
Yang Liu
Yan Zhang
Yufang Liu
Hao Fei
Xunliang Cai
LRM
35
7
0
21 Aug 2024
ProFuser: Progressive Fusion of Large Language Models
Tianyuan Shi
Fanqi Wan
Canbin Huang
Xiaojun Quan
Chenliang Li
Ming Yan
Ji Zhang
MoMe
33
2
0
09 Aug 2024
COAST: Enhancing the Code Debugging Ability of LLMs through Communicative Agent Based Data Synthesis
Weiqing Yang
Hanbin Wang
Zhenghao Liu
Xinze Li
Yukun Yan
Shuo Wang
Yu Gu
Minghe Yu
Zhiyuan Liu
Ge Yu
50
2
0
09 Aug 2024
Weak-to-Strong Reasoning
Yuqing Yang
Yan Ma
Pengfei Liu
LRM
37
13
0
18 Jul 2024
A Single Transformer for Scalable Vision-Language Modeling
Yangyi Chen
Xingyao Wang
Hao Peng
Heng Ji
LRM
42
15
0
08 Jul 2024
DogeRM: Equipping Reward Models with Domain Knowledge through Model Merging
Tzu-Han Lin
Chen An Li
Hung-yi Lee
Yun-Nung Chen
VLM
ALM
26
4
0
01 Jul 2024
Understanding and Mitigating Language Confusion in LLMs
Kelly Marchisio
Wei-Yin Ko
Alexandre Berard
Théo Dehaze
Sebastian Ruder
58
23
0
28 Jun 2024
RL on Incorrect Synthetic Data Scales the Efficiency of LLM Math Reasoning by Eight-Fold
Amrith Rajagopal Setlur
Saurabh Garg
Xinyang Geng
Naman Garg
Virginia Smith
Aviral Kumar
42
48
0
20 Jun 2024
Learn Beyond The Answer: Training Language Models with Reflection for Mathematical Reasoning
Zhihan Zhang
Zhenwen Liang
Wenhao Yu
Dian Yu
Mengzhao Jia
Dong Yu
Meng Jiang
AIMat
RALM
LRM
ReLM
35
13
0
17 Jun 2024
Meta Reasoning for Large Language Models
Peizhong Gao
Ao Xie
Shaoguang Mao
Wenshan Wu
Yan Xia
Haipeng Mi
Furu Wei
ReLM
LLMAG
LRM
62
7
0
17 Jun 2024
Watch Every Step! LLM Agent Learning via Iterative Step-Level Process Refinement
Weimin Xiong
Yifan Song
Xiutian Zhao
Wenhao Wu
Xun Wang
Ke Wang
Cheng Li
Wei Peng
Sujian Li
49
26
0
17 Jun 2024
Self-Evolution Fine-Tuning for Policy Optimization
Ruijun Chen
Jiehao Liang
Shiping Gao
Fanqi Wan
Xiaojun Quan
54
0
0
16 Jun 2024
3D-Properties: Identifying Challenges in DPO and Charting a Path Forward
Yuzi Yan
Yibo Miao
J. Li
Yipin Zhang
Jian Xie
Zhijie Deng
Dong Yan
57
11
0
11 Jun 2024
UltraMedical: Building Specialized Generalists in Biomedicine
Kaiyan Zhang
Sihang Zeng
Ermo Hua
Ning Ding
Zhang-Ren Chen
...
Xuekai Zhu
Xingtai Lv
Hu Jinfang
Zhiyuan Liu
Bowen Zhou
LM&MA
43
22
0
06 Jun 2024
JiuZhang3.0: Efficiently Improving Mathematical Reasoning by Training Small Data Synthesis Models
Kun Zhou
Beichen Zhang
Jiapeng Wang
Zhipeng Chen
Wayne Xin Zhao
Jing Sha
Zhichao Sheng
Shijin Wang
Ji-Rong Wen
SyDa
LRM
46
30
0
23 May 2024
Annotation-Efficient Preference Optimization for Language Model Alignment
Yuu Jinnai
Ukyo Honda
42
0
0
22 May 2024
RLHF Workflow: From Reward Modeling to Online RLHF
Hanze Dong
Wei Xiong
Bo Pang
Haoxiang Wang
Han Zhao
Yingbo Zhou
Nan Jiang
Doyen Sahoo
Caiming Xiong
Tong Zhang
OffRL
29
98
0
13 May 2024
MAmmoTH2: Scaling Instructions from the Web
Xiang Yue
Tuney Zheng
Ge Zhang
Wenhu Chen
ALM
LRM
57
87
0
06 May 2024
D2PO: Discriminator-Guided DPO with Response Evaluation Models
Prasann Singhal
Nathan Lambert
S. Niekum
Tanya Goyal
Greg Durrett
OffRL
EGVM
43
4
0
02 May 2024
Toward Self-Improvement of LLMs via Imagination, Searching, and Criticizing
Ye Tian
Baolin Peng
Linfeng Song
Lifeng Jin
Dian Yu
Haitao Mi
Dong Yu
LRM
ReLM
55
66
0
18 Apr 2024
Regularized Best-of-N Sampling with Minimum Bayes Risk Objective for Language Model Alignment
Yuu Jinnai
Tetsuro Morimura
Kaito Ariu
Kenshi Abe
69
7
0
01 Apr 2024
RewardBench: Evaluating Reward Models for Language Modeling
Nathan Lambert
Valentina Pyatkin
Jacob Morrison
Lester James V. Miranda
Bill Yuchen Lin
...
Sachin Kumar
Tom Zick
Yejin Choi
Noah A. Smith
Hanna Hajishirzi
ALM
85
218
0
20 Mar 2024
CodeUltraFeedback: An LLM-as-a-Judge Dataset for Aligning Large Language Models to Coding Preferences
Martin Weyssow
Aton Kamanda
H. Sahraoui
ALM
67
33
0
14 Mar 2024
Noise Contrastive Alignment of Language Models with Explicit Rewards
Huayu Chen
Guande He
Lifan Yuan
Ganqu Cui
Hang Su
Jun Zhu
60
44
0
08 Feb 2024
KTO: Model Alignment as Prospect Theoretic Optimization
Kawin Ethayarajh
Winnie Xu
Niklas Muennighoff
Dan Jurafsky
Douwe Kiela
182
456
0
02 Feb 2024
DeepSeek LLM: Scaling Open-Source Language Models with Longtermism
DeepSeek-AI Xiao Bi
:
Xiao Bi
Deli Chen
Guanting Chen
...
Yao Zhao
Shangyan Zhou
Shunfeng Zhou
Qihao Zhu
Yuheng Zou
LRM
ALM
139
309
0
05 Jan 2024
MINT: Evaluating LLMs in Multi-turn Interaction with Tools and Language Feedback
Xingyao Wang
Zihan Wang
Jiateng Liu
Yangyi Chen
Lifan Yuan
Hao Peng
Heng Ji
LRM
133
142
0
19 Sep 2023
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
398
8,559
0
28 Jan 2022
Measuring Coding Challenge Competence With APPS
Dan Hendrycks
Steven Basart
Saurav Kadavath
Mantas Mazeika
Akul Arora
...
Collin Burns
Samir Puranik
Horace He
D. Song
Jacob Steinhardt
ELM
AIMat
ALM
208
627
0
20 May 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
256
677
0
06 Jan 2021
Previous
1
2