Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective

4 December 2023

Papers citing "Distilled Self-Critique of LLMs with Synthetic Data: a Bayesian Perspective"

6 / 6 papers shown

Title
Efficient Reinforcement Learning with Large Language Model Priors Xue Yan Yan Song Xidong Feng Girish A. Koushik Haifeng Zhang Haitham Bou Ammar Jun Wang OffRL 33 3 0 10 Oct 2024
Merging Improves Self-Critique Against Jailbreak Attacks Victor Gallego AAML MoMe 44 3 0 11 Jun 2024
Configurable Safety Tuning of Language Models with Synthetic Preference Data Víctor Gallego 37 5 0 30 Mar 2024
Refined Direct Preference Optimization with Synthetic Data for Behavioral Alignment of LLMs Víctor Gallego SyDa 35 6 0 12 Feb 2024
ZYN: Zero-Shot Reward Models with Yes-No Questions for RLAIF Víctor Gallego SyDa 51 4 0 11 Aug 2023
Training language models to follow instructions with human feedback Long Ouyang Jeff Wu Xu Jiang Diogo Almeida Carroll L. Wainwright ... Amanda Askell Peter Welinder Paul Christiano Jan Leike Ryan J. Lowe OSLM ALM 366 12,003 0 04 Mar 2022