AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

v1v2 (latest)

AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation

4 March 2025

ArXiv (abs)PDF HTML

Papers citing "AlignDistil: Token-Level Language Model Alignment as Adaptive Policy Distillation"

4 / 4 papers shown

Title
A Dual-Space Framework for General Knowledge Distillation of Large Language Models Wei Wei Songming Zhang Yunlong Liang Fandong Meng Yufeng Chen Jinan Xu Jie Zhou 124 0 0 15 Apr 2025
Latent Feature Mining for Predictive Model Enhancement with Large Language Models Bingxuan Li Pengyi Shi Amy Ward 130 0 0 06 Oct 2024
DPO Meets PPO: Reinforced Token Optimization for RLHF Han Zhong Zikang Shan Guhao Feng Wei Xiong Xinle Cheng Li Zhao Di He Jiang Bian Liwei Wang 155 72 0 29 Apr 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators Yann Dubois Balázs Galambosi Percy Liang Tatsunori Hashimoto ALM 171 403 0 06 Apr 2024

We use cookies and other tracking technologies to improve your browsing experience on our website, to show you personalized content and targeted ads, to analyze our website traffic, and to understand where our visitors are coming from. See our policy.