ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2409.17401
  4. Cited By
Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference

25 September 2024
Qining Zhang
Lei Ying
    OffRL
ArXivPDFHTML

Papers citing "Zeroth-Order Policy Gradient for Reinforcement Learning from Human Feedback without Reward Inference"

7 / 57 papers shown
Title
Deep reinforcement learning from human preferences
Deep reinforcement learning from human preferences
Paul Christiano
Jan Leike
Tom B. Brown
Miljan Martic
Shane Legg
Dario Amodei
134
3,296
0
12 Jun 2017
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Tim Salimans
Jonathan Ho
Xi Chen
Szymon Sidor
Ilya Sutskever
92
1,537
0
10 Mar 2017
Linear Convergence of Gradient and Proximal-Gradient Methods Under the
  Polyak-Łojasiewicz Condition
Linear Convergence of Gradient and Proximal-Gradient Methods Under the Polyak-Łojasiewicz Condition
Hamed Karimi
J. Nutini
Mark Schmidt
266
1,218
0
16 Aug 2016
Reducing Dueling Bandits to Cardinal Bandits
Reducing Dueling Bandits to Cardinal Bandits
Nir Ailon
Thorsten Joachims
Zohar Karnin
144
139
0
14 May 2014
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic
  Programming
Stochastic First- and Zeroth-order Methods for Nonconvex Stochastic Programming
Saeed Ghadimi
Guanghui Lan
ODL
120
1,548
0
22 Sep 2013
On the Complexity Analysis of Randomized Block-Coordinate Descent
  Methods
On the Complexity Analysis of Randomized Block-Coordinate Descent Methods
Zhaosong Lu
Lin Xiao
80
254
0
21 May 2013
Random Utility Theory for Social Choice
Random Utility Theory for Social Choice
Hossein Azari Soufiani
David C. Parkes
Lirong Xia
98
151
0
11 Nov 2012
Previous
12