Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2410.23703
Cited By
OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models
31 October 2024
Junda Wu
Xintong Li
Ruoyu Wang
Yu Xia
Yuxin Xiong
Jianing Wang
Tong Yu
Xiang Chen
Branislav Kveton
Lina Yao
Jingbo Shang
Julian McAuley
OffRL
LRM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"OCEAN: Offline Chain-of-thought Evaluation and Alignment in Large Language Models"
27 / 27 papers shown
Title
Aligning Language Models Using Follow-up Likelihood as Reward Signal
Chen Zhang
Dading Chong
Feng Jiang
Chengguang Tang
Anningzhe Gao
Guohua Tang
Haizhou Li
ALM
60
2
0
20 Sep 2024
Off-Policy Evaluation from Logged Human Feedback
Aniruddha Bhargava
Lalit P. Jain
Branislav Kveton
Ge Liu
Subhojyoti Mukherjee
OffRL
49
2
0
14 Jun 2024
Beyond Chain-of-Thought: A Survey of Chain-of-X Paradigms for LLMs
Yu Xia
Rui Wang
Xu Liu
Mingyan Li
Tong Yu
Xiang Chen
Julian McAuley
Shuai Li
LRM
83
21
0
24 Apr 2024
GraphGPT: Graph Instruction Tuning for Large Language Models
Jiabin Tang
Yuhao Yang
Wei Wei
Lei Shi
Lixin Su
Suqi Cheng
Dawei Yin
Chao Huang
91
138
0
19 Oct 2023
Off-Policy Evaluation for Human Feedback
Qitong Gao
Ge Gao
Juncheng Dong
Vahid Tarokh
Min Chi
Miroslav Pajic
OffRL
74
5
0
11 Oct 2023
Knowledge Graph Prompting for Multi-Document Question Answering
Yu Wang
Nedim Lipka
Ryan Rossi
Alexa F. Siu
Ruiyi Zhang
Hanyu Wang
RALM
58
130
0
22 Aug 2023
An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning
Yun Luo
Zhen Yang
Fandong Meng
Yafu Li
Jie Zhou
Yue Zhang
CLL
KELM
124
302
0
17 Aug 2023
Measuring Faithfulness in Chain-of-Thought Reasoning
Tamera Lanham
Anna Chen
Ansh Radhakrishnan
Benoit Steiner
Carson E. Denison
...
Zac Hatfield-Dodds
Jared Kaplan
J. Brauner
Sam Bowman
Ethan Perez
ReLM
LRM
61
184
0
17 Jul 2023
Deductive Verification of Chain-of-Thought Reasoning
Z. Ling
Yunhao Fang
Xuanlin Li
Zhiao Huang
Mingu Lee
Roland Memisevic
Hao Su
ReLM
LRM
58
133
0
06 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
133
325
0
02 Jun 2023
Principle-Driven Self-Alignment of Language Models from Scratch with Minimal Human Supervision
Zhiqing Sun
Songlin Yang
Qinhong Zhou
Hongxin Zhang
Zhenfang Chen
David D. Cox
Yiming Yang
Chuang Gan
SyDa
ALM
74
330
0
04 May 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
140
369
0
11 Apr 2023
GPT-4 Technical Report
OpenAI OpenAI
OpenAI Josh Achiam
Steven Adler
Sandhini Agarwal
Lama Ahmad
...
Shengjia Zhao
Tianhao Zheng
Juntang Zhuang
William Zhuk
Barret Zoph
LLMAG
MLLM
1.0K
14,179
0
15 Mar 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
168
1,603
0
15 Dec 2022
Knowledge Prompting in Pre-trained Language Model for Natural Language Understanding
Jiadong Wang
Wenkang Huang
Qiuhui Shi
Hongbin Wang
Minghui Qiu
Xiang Li
Ming Gao
KELM
VLM
65
17
0
16 Oct 2022
Chain-of-Thought Prompting Elicits Reasoning in Large Language Models
Jason W. Wei
Xuezhi Wang
Dale Schuurmans
Maarten Bosma
Brian Ichter
F. Xia
Ed H. Chi
Quoc Le
Denny Zhou
LM&Ro
LRM
AI4CE
ReLM
738
9,267
0
28 Jan 2022
LoRA: Low-Rank Adaptation of Large Language Models
J. E. Hu
Yelong Shen
Phillip Wallis
Zeyuan Allen-Zhu
Yuanzhi Li
Shean Wang
Lu Wang
Weizhu Chen
OffRL
AI4TS
AI4CE
ALM
AIMat
371
10,226
0
17 Jun 2021
Did Aristotle Use a Laptop? A Question Answering Benchmark with Implicit Reasoning Strategies
Mor Geva
Daniel Khashabi
Elad Segal
Tushar Khot
Dan Roth
Jonathan Berant
RALM
325
715
0
06 Jan 2021
PubMedQA: A Dataset for Biomedical Research Question Answering
Qiao Jin
Bhuwan Dhingra
Zhengping Liu
William W. Cohen
Xinghua Lu
353
883
0
13 Sep 2019
Reinforcement Learning in Healthcare: A Survey
Chao Yu
Jiming Liu
S. Nemati
LM&MA
OffRL
152
566
0
22 Aug 2019
BoolQ: Exploring the Surprising Difficulty of Natural Yes/No Questions
Christopher Clark
Kenton Lee
Ming-Wei Chang
Tom Kwiatkowski
Michael Collins
Kristina Toutanova
205
1,511
0
24 May 2019
HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering
Zhilin Yang
Peng Qi
Saizheng Zhang
Yoshua Bengio
William W. Cohen
Ruslan Salakhutdinov
Christopher D. Manning
RALM
147
2,635
0
25 Sep 2018
The Web as a Knowledge-base for Answering Complex Questions
Alon Talmor
Jonathan Berant
66
578
0
18 Mar 2018
Complex Sequential Question Answering: Towards Learning to Converse Over Linked Question Answer Pairs with a Knowledge Graph
Amrita Saha
Vardaan Pahuja
Mitesh M. Khapra
Karthik Sankaranarayanan
A. Chandar
64
200
0
31 Jan 2018
Offline A/B testing for Recommender Systems
Alexandre Gilotte
Clément Calauzènes
Thomas Nedelec
A. Abraham
Simon Dollé
OffRL
65
221
0
22 Jan 2018
Unbiased Learning-to-Rank with Biased Feedback
Thorsten Joachims
Adith Swaminathan
Tobias Schnabel
CML
73
540
0
16 Aug 2016
Unbiased Offline Evaluation of Contextual-bandit-based News Article Recommendation Algorithms
Lihong Li
Wei Chu
John Langford
Xuanhui Wang
OffRL
189
575
0
31 Mar 2010
1