ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2410.08458
  4. Cited By
Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both
v1v2 (latest)

Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both

11 October 2024
Abhijnan Nath
Changsoo Jung
Ethan Seefried
Nikhil Krishnaswamy
ArXiv (abs)PDFHTML

Papers citing "Simultaneous Reward Distillation and Preference Learning: Get You a Language Model Who Can Do Both"

11 / 61 papers shown
Title
Advantage-Weighted Regression: Simple and Scalable Off-Policy
  Reinforcement Learning
Advantage-Weighted Regression: Simple and Scalable Off-Policy Reinforcement Learning
Xue Bin Peng
Aviral Kumar
Grace Zhang
Sergey Levine
OffRL
157
570
0
01 Oct 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
491
1,770
0
18 Sep 2019
Neural Text Generation with Unlikelihood Training
Neural Text Generation with Unlikelihood Training
Sean Welleck
Ilia Kulikov
Stephen Roller
Emily Dinan
Kyunghyun Cho
Jason Weston
MU
68
583
0
12 Aug 2019
The Curious Case of Neural Text Degeneration
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
207
3,213
0
22 Apr 2019
Parameter-Efficient Transfer Learning for NLP
Parameter-Efficient Transfer Learning for NLP
N. Houlsby
A. Giurgiu
Stanislaw Jastrzebski
Bruna Morrone
Quentin de Laroussilhe
Andrea Gesmundo
Mona Attariyan
Sylvain Gelly
223
4,529
0
02 Feb 2019
Decoupled Weight Decay Regularization
Decoupled Weight Decay Regularization
I. Loshchilov
Frank Hutter
OffRL
154
2,158
0
14 Nov 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
571
19,315
0
20 Jul 2017
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
218
3,377
0
12 Jun 2017
Concrete Problems in AI Safety
Concrete Problems in AI Safety
Dario Amodei
C. Olah
Jacob Steinhardt
Paul Christiano
John Schulman
Dandelion Mané
256
2,405
0
21 Jun 2016
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and
  Beyond
Abstractive Text Summarization Using Sequence-to-Sequence RNNs and Beyond
Ramesh Nallapati
Bowen Zhou
Cicero Nogueira dos Santos
Çağlar Gülçehre
Bing Xiang
AIMat
294
2,568
0
19 Feb 2016
Distilling the Knowledge in a Neural Network
Distilling the Knowledge in a Neural Network
Geoffrey E. Hinton
Oriol Vinyals
J. Dean
FedML
367
19,745
0
09 Mar 2015
Previous
12