
GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment
Papers citing "GenARM: Reward Guided Generation with Autoregressive Reward Model for Test-time Alignment"
48 / 48 papers shown
Title |
---|
![]() Transfer Q Star: Principled Decoding for LLM Alignment Souradip Chakraborty Soumya Suvra Ghosal Ming Yin Dinesh Manocha Mengdi Wang Amrit Singh Bedi Furong Huang |
![]() Controllable Preference Optimization: Toward Controllable
Multi-Objective Alignment Yiju Guo Ganqu Cui Lifan Yuan Ning Ding Jiexin Wang ...Ruobing Xie Jie Zhou Yankai Lin Zhiyuan Liu Maosong Sun |