Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2311.14543
Cited By
Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language
24 November 2023
Di Jin
Shikib Mehri
Devamanyu Hazarika
Aishwarya Padmakumar
Sungjin Lee
Yang Liu
Mahdi Namazifar
ALM
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Data-Efficient Alignment of Large Language Models with Human Feedback Through Natural Language"
11 / 11 papers shown
Title
From Exploration to Mastery: Enabling LLMs to Master Tools via Self-Driven Interactions
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Ji-Rong Wen
92
11
0
10 Oct 2024
Judging LLM-as-a-Judge with MT-Bench and Chatbot Arena
Lianmin Zheng
Wei-Lin Chiang
Ying Sheng
Siyuan Zhuang
Zhanghao Wu
...
Dacheng Li
Eric Xing
Haotong Zhang
Joseph E. Gonzalez
Ion Stoica
ALM
OSLM
ELM
312
4,253
0
09 Jun 2023
Small Language Models Improve Giants by Rewriting Their Outputs
Giorgos Vernikos
Arthur Bravzinskas
Jakub Adamek
Jonathan Mallinson
Aliaksei Severyn
Eric Malmi
BDL
LRM
52
16
0
22 May 2023
RRHF: Rank Responses to Align Language Models with Human Feedback without tears
Zheng Yuan
Hongyi Yuan
Chuanqi Tan
Wei Wang
Songfang Huang
Feiran Huang
ALM
143
369
0
11 Apr 2023
Constitutional AI: Harmlessness from AI Feedback
Yuntao Bai
Saurav Kadavath
Sandipan Kundu
Amanda Askell
John Kernion
...
Dario Amodei
Nicholas Joseph
Sam McCandlish
Tom B. Brown
Jared Kaplan
SyDa
MoMe
168
1,603
0
15 Dec 2022
Scaling Instruction-Finetuned Language Models
Hyung Won Chung
Le Hou
Shayne Longpre
Barret Zoph
Yi Tay
...
Jacob Devlin
Adam Roberts
Denny Zhou
Quoc V. Le
Jason W. Wei
ReLM
LRM
167
3,110
0
20 Oct 2022
EditEval: An Instruction-Based Benchmark for Text Improvements
Jane Dwivedi-Yu
Timo Schick
Zhengbao Jiang
Maria Lomeli
Patrick Lewis
Gautier Izacard
Edouard Grave
Sebastian Riedel
Fabio Petroni
70
27
0
27 Sep 2022
Self-critiquing models for assisting human evaluators
William Saunders
Catherine Yeh
Jeff Wu
Steven Bills
Ouyang Long
Jonathan Ward
Jan Leike
ALM
ELM
65
300
0
12 Jun 2022
Quark: Controllable Text Generation with Reinforced Unlearning
Ximing Lu
Sean Welleck
Jack Hessel
Liwei Jiang
Lianhui Qin
Peter West
Prithviraj Ammanabrolu
Yejin Choi
MU
99
215
0
26 May 2022
Training Language Models with Language Feedback
Jérémy Scheurer
Jon Ander Campos
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
80
49
0
29 Apr 2022
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
444
18,931
0
20 Jul 2017
1