ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2306.02231
  4. Cited By
Fine-Tuning Language Models with Advantage-Induced Policy Alignment

Fine-Tuning Language Models with Advantage-Induced Policy Alignment

4 June 2023
Banghua Zhu
Hiteshi Sharma
Felipe Vieira Frujeri
Shi Dong
Chenguang Zhu
Michael I. Jordan
Jiantao Jiao
    OSLM
ArXivPDFHTML

Papers citing "Fine-Tuning Language Models with Advantage-Induced Policy Alignment"

37 / 37 papers shown
Title
SegAgent: Exploring Pixel Understanding Capabilities in MLLMs by Imitating Human Annotator Trajectories
Muzhi Zhu
Yuzhuo Tian
Hao Chen
Chunluan Zhou
Qingpei Guo
Yongxu Liu
M. Yang
Chunhua Shen
MLLM
VLM
81
0
0
11 Mar 2025
BPO: Towards Balanced Preference Optimization between Knowledge Breadth and Depth in Alignment
Sizhe Wang
Yongqi Tong
Hengyuan Zhang
Dawei Li
Xin Zhang
Tianlong Chen
87
5
0
21 Feb 2025
One fish, two fish, but not the whole sea: Alignment reduces language
  models' conceptual diversity
One fish, two fish, but not the whole sea: Alignment reduces language models' conceptual diversity
Sonia K. Murthy
Tomer Ullman
Jennifer Hu
ALM
48
11
0
07 Nov 2024
Guided Stream of Search: Learning to Better Search with Language Models
  via Optimal Path Guidance
Guided Stream of Search: Learning to Better Search with Language Models via Optimal Path Guidance
Seungyong Moon
Bumsoo Park
Hyun Oh Song
RALM
AIFin
29
1
0
03 Oct 2024
Preference Alignment Improves Language Model-Based TTS
Preference Alignment Improves Language Model-Based TTS
Jinchuan Tian
Chunlei Zhang
Jiatong Shi
Hao Zhang
Jianwei Yu
Shinji Watanabe
Dong Yu
32
7
0
19 Sep 2024
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation
Policy Filtration in RLHF to Fine-Tune LLM for Code Generation
Wei Shen
Chuheng Zhang
OffRL
41
6
0
11 Sep 2024
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Mission Impossible: A Statistical Perspective on Jailbreaking LLMs
Jingtong Su
Mingyu Lee
SangKeun Lee
46
8
0
02 Aug 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
47
12
0
06 Jul 2024
Self-Evolution Fine-Tuning for Policy Optimization
Self-Evolution Fine-Tuning for Policy Optimization
Ruijun Chen
Jiehao Liang
Shiping Gao
Fanqi Wan
Xiaojun Quan
54
0
0
16 Jun 2024
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Latent Logic Tree Extraction for Event Sequence Explanation from LLMs
Zitao Song
Chao Yang
Chaojie Wang
Bo An
Shuang Li
63
4
0
03 Jun 2024
Online Self-Preferring Language Models
Online Self-Preferring Language Models
Yuanzhao Zhai
Zhuo Zhang
Kele Xu
Hanyang Peng
Yue Yu
Dawei Feng
Cheng Yang
Bo Ding
Huaimin Wang
56
0
0
23 May 2024
REBEL: Reinforcement Learning via Regressing Relative Rewards
REBEL: Reinforcement Learning via Regressing Relative Rewards
Zhaolin Gao
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Gokul Swamy
Kianté Brantley
Thorsten Joachims
J. Andrew Bagnell
Jason D. Lee
Wen Sun
OffRL
46
31
0
25 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
Ameet Deshpande
Bruno Castro da Silva
29
34
0
12 Apr 2024
Dataset Reset Policy Optimization for RLHF
Dataset Reset Policy Optimization for RLHF
Jonathan D. Chang
Wenhao Zhan
Owen Oertell
Kianté Brantley
Dipendra Kumar Misra
Jason D. Lee
Wen Sun
OffRL
30
21
0
12 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
66
32
0
08 Apr 2024
Stream of Search (SoS): Learning to Search in Language
Stream of Search (SoS): Learning to Search in Language
Kanishk Gandhi
Denise Lee
Gabriel Grand
Muxin Liu
Winson Cheng
Archit Sharma
Noah D. Goodman
RALM
AIFin
LRM
52
47
0
01 Apr 2024
Scaling Data Diversity for Fine-Tuning Language Models in Human
  Alignment
Scaling Data Diversity for Fine-Tuning Language Models in Human Alignment
Feifan Song
Bowen Yu
Hao Lang
Haiyang Yu
Fei Huang
Houfeng Wang
Yongbin Li
ALM
43
11
0
17 Mar 2024
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Do LLMs Implicitly Determine the Suitable Text Difficulty for Users?
Seiji Gobara
Hidetaka Kamigaito
Taro Watanabe
40
4
0
22 Feb 2024
Generative AI Security: Challenges and Countermeasures
Generative AI Security: Challenges and Countermeasures
Banghua Zhu
Norman Mu
Jiantao Jiao
David Wagner
AAML
SILM
66
8
0
20 Feb 2024
Robust Prompt Optimization for Defending Language Models Against
  Jailbreaking Attacks
Robust Prompt Optimization for Defending Language Models Against Jailbreaking Attacks
Andy Zhou
Bo Li
Haohan Wang
AAML
49
74
0
30 Jan 2024
Iterative Data Smoothing: Mitigating Reward Overfitting and
  Overoptimization in RLHF
Iterative Data Smoothing: Mitigating Reward Overfitting and Overoptimization in RLHF
Banghua Zhu
Michael I. Jordan
Jiantao Jiao
31
25
0
29 Jan 2024
I am a Strange Dataset: Metalinguistic Tests for Language Models
I am a Strange Dataset: Metalinguistic Tests for Language Models
Tristan Thrush
Jared Moore
Miguel Monares
Christopher Potts
Douwe Kiela
27
5
0
10 Jan 2024
Uncertainty-Penalized Reinforcement Learning from Human Feedback with
  Diverse Reward LoRA Ensembles
Uncertainty-Penalized Reinforcement Learning from Human Feedback with Diverse Reward LoRA Ensembles
Yuanzhao Zhai
Han Zhang
Yu Lei
Yue Yu
Kele Xu
Dawei Feng
Bo Ding
Huaimin Wang
AI4CE
81
32
0
30 Dec 2023
Iterative Preference Learning from Human Feedback: Bridging Theory and
  Practice for RLHF under KL-Constraint
Iterative Preference Learning from Human Feedback: Bridging Theory and Practice for RLHF under KL-Constraint
Wei Xiong
Hanze Dong
Chen Ye
Ziqi Wang
Han Zhong
Heng Ji
Nan Jiang
Tong Zhang
OffRL
38
164
0
18 Dec 2023
ULMA: Unified Language Model Alignment with Human Demonstration and
  Point-wise Preference
ULMA: Unified Language Model Alignment with Human Demonstration and Point-wise Preference
Tianchi Cai
Xierui Song
Jiyan Jiang
Fei Teng
Jinjie Gu
Guannan Zhang
ALM
21
4
0
05 Dec 2023
PromptMix: A Class Boundary Augmentation Method for Large Language Model
  Distillation
PromptMix: A Class Boundary Augmentation Method for Large Language Model Distillation
Gaurav Sahu
Olga Vechtomova
Dzmitry Bahdanau
I. Laradji
VLM
55
24
0
22 Oct 2023
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method
  for Aligning Large Language Models
ReMax: A Simple, Effective, and Efficient Reinforcement Learning Method for Aligning Large Language Models
Ziniu Li
Tian Xu
Yushun Zhang
Zhihang Lin
Yang Yu
Ruoyu Sun
Zhimin Luo
27
52
0
16 Oct 2023
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Towards the Fundamental Limits of Knowledge Transfer over Finite Domains
Qingyue Zhao
Banghua Zhu
41
4
0
11 Oct 2023
Generating and Evaluating Tests for K-12 Students with Language Model
  Simulations: A Case Study on Sentence Reading Efficiency
Generating and Evaluating Tests for K-12 Students with Language Model Simulations: A Case Study on Sentence Reading Efficiency
E. Zelikman
Wanjing Anya Ma
Jasmine E. Tran
Diyi Yang
Jason D. Yeatman
Nick Haber
AI4Ed
32
9
0
10 Oct 2023
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for
  LLM Alignment
Pairwise Proximal Policy Optimization: Harnessing Relative Feedback for LLM Alignment
Tianhao Wu
Banghua Zhu
Ruoyu Zhang
Zhaojin Wen
Kannan Ramchandran
Jiantao Jiao
44
55
0
30 Sep 2023
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF
Víctor Gallego
SyDa
51
4
0
11 Aug 2023
Preference Ranking Optimization for Human Alignment
Preference Ranking Optimization for Human Alignment
Feifan Song
Yu Bowen
Minghao Li
Haiyang Yu
Fei Huang
Yongbin Li
Houfeng Wang
ALM
28
239
0
30 Jun 2023
Leftover Lunch: Advantage-based Offline Reinforcement Learning for
  Language Models
Leftover Lunch: Advantage-based Offline Reinforcement Learning for Language Models
Ashutosh Baheti
Ximing Lu
Faeze Brahman
Ronan Le Bras
Maarten Sap
Mark O. Riedl
38
9
0
24 May 2023
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
234
449
0
23 Aug 2022
Offline RL for Natural Language Generation with Implicit Language Q
  Learning
Offline RL for Natural Language Generation with Implicit Language Q Learning
Charles Burton Snell
Ilya Kostrikov
Yi Su
Mengjiao Yang
Sergey Levine
OffRL
144
103
0
05 Jun 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLM
ALM
384
12,081
0
04 Mar 2022
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
301
1,616
0
18 Sep 2019
1