ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2505.06274
  4. Cited By
PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

6 May 2025
Baijiong Lin
Weisen Jiang
Yuancheng Xu
Hao Chen
Ying-Cong Chen
ArXiv (abs)PDFHTML

Papers citing "PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model"

21 / 21 papers shown
Title
Multi-objective Large Language Model Alignment with Hierarchical Experts
Multi-objective Large Language Model Alignment with Hierarchical Experts
Zhuo Li
Guodong DU
Weiyang Guo
Yigeng Zhou
Xiucheng Li
...
Fangming Liu
Yequan Wang
Deheng Ye
Min Zhang
Jing Li
ALMMoE
70
0
0
27 May 2025
Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond
Gradient-Based Multi-Objective Deep Learning: Algorithms, Theories, Applications, and Beyond
Weiyu Chen
Xiaoyuan Zhang
Baijiong Lin
Xi Lin
Han Zhao
Qingfu Zhang
James T. Kwok
150
5
0
19 Jan 2025
Pareto Set Learning for Multi-Objective Reinforcement Learning
Pareto Set Learning for Multi-Objective Reinforcement Learning
Erlong Liu
Yu-Chang Wu
Xiaobin Huang
Chengrui Gao
Ren-Jian Wang
Ke Xue
Chao Qian
OffRL
235
2
0
12 Jan 2025
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Yuancheng Xu
Udari Madhushani Sehwag
Alec Koppel
Sicheng Zhu
Bang An
Furong Huang
Sumitra Ganesh
130
14
0
10 Oct 2024
Controllable Text Generation for Large Language Models: A Survey
Controllable Text Generation for Large Language Models: A Survey
Xun Liang
Hanyu Wang
Yezhaohui Wang
Shichao Song
Jiawei Yang
...
Jie Hu
Dan Liu
Shunyu Yao
Feiyu Xiong
Zhiyu Li
60
22
0
22 Aug 2024
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Pareto Low-Rank Adapters: Efficient Multi-Task Learning with Preferences
Nikolaos Dimitriadis
Pascal Frossard
François Fleuret
MoE
247
8
0
10 Jul 2024
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Cascade Reward Sampling for Efficient Decoding-Time Alignment
Bolian Li
Yifan Wang
A. Grama
Ruqi Zhang
Ruqi Zhang
AI4TS
115
15
0
24 Jun 2024
Disentangling Length from Quality in Direct Preference Optimization
Disentangling Length from Quality in Direct Preference Optimization
Ryan Park
Rafael Rafailov
Stefano Ermon
Chelsea Finn
ALM
98
145
0
28 Mar 2024
MetaAligner: Towards Generalizable Multi-Objective Alignment of Language
  Models
MetaAligner: Towards Generalizable Multi-Objective Alignment of Language Models
Kailai Yang
Zhiwei Liu
Qianqian Xie
Jimin Huang
Tianlin Zhang
Sophia Ananiadou
68
18
0
25 Mar 2024
Arithmetic Control of LLMs for Diverse User Preferences: Directional
  Preference Alignment with Multi-Objective Rewards
Arithmetic Control of LLMs for Diverse User Preferences: Directional Preference Alignment with Multi-Objective Rewards
Haoxiang Wang
Yong Lin
Wei Xiong
Rui Yang
Shizhe Diao
Shuang Qiu
Han Zhao
Tong Zhang
115
87
0
28 Feb 2024
DeAL: Decoding-time Alignment for Large Language Models
DeAL: Decoding-time Alignment for Large Language Models
James Y. Huang
Sailik Sengupta
Daniele Bonadiman
Yi-An Lai
Arshit Gupta
Nikolaos Pappas
Saab Mansour
Katrin Kirchoff
Dan Roth
122
36
0
05 Feb 2024
TinyLlama: An Open-Source Small Language Model
TinyLlama: An Open-Source Small Language Model
Peiyuan Zhang
Guangtao Zeng
Tianduo Wang
Wei Lu
ALMLRM
160
407
0
04 Jan 2024
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct
  Preference Optimization
Beyond One-Preference-Fits-All Alignment: Multi-Objective Direct Preference Optimization
Zhanhui Zhou
Jie Liu
Chao Yang
Jing Shao
Yu Liu
Xiangyu Yue
Wanli Ouyang
Yu Qiao
73
61
0
05 Oct 2023
Rewarded soups: towards Pareto-optimal alignment by interpolating
  weights fine-tuned on diverse rewards
Rewarded soups: towards Pareto-optimal alignment by interpolating weights fine-tuned on diverse rewards
Alexandre Ramé
Guillaume Couairon
Mustafa Shukor
Corentin Dancette
Jean-Baptiste Gaya
Laure Soulier
Matthieu Cord
MoMe
103
157
0
07 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
153
335
0
02 Jun 2023
Direct Preference Optimization: Your Language Model is Secretly a Reward
  Model
Direct Preference Optimization: Your Language Model is Secretly a Reward Model
Rafael Rafailov
Archit Sharma
E. Mitchell
Stefano Ermon
Christopher D. Manning
Chelsea Finn
ALM
389
4,169
0
29 May 2023
Pareto Manifold Learning: Tackling multiple tasks via ensembles of
  single-task models
Pareto Manifold Learning: Tackling multiple tasks via ensembles of single-task models
Nikolaos Dimitriadis
P. Frossard
Franccois Fleuret
82
25
0
18 Oct 2022
Training language models to follow instructions with human feedback
Training language models to follow instructions with human feedback
Long Ouyang
Jeff Wu
Xu Jiang
Diogo Almeida
Carroll L. Wainwright
...
Amanda Askell
Peter Welinder
Paul Christiano
Jan Leike
Ryan J. Lowe
OSLMALM
891
13,228
0
04 Mar 2022
Reasonable Effectiveness of Random Weighting: A Litmus Test for
  Multi-Task Learning
Reasonable Effectiveness of Random Weighting: A Litmus Test for Multi-Task Learning
Baijiong Lin
Feiyang Ye
Yu Zhang
Ivor W. Tsang
109
99
0
20 Nov 2021
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
151
979
0
04 Dec 2019
A Survey on Multi-Task Learning
A Survey on Multi-Task Learning
Yu Zhang
Qiang Yang
AIMat
607
2,247
0
25 Jul 2017
1