ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2502.19312
  4. Cited By
FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users

FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users

26 February 2025
Anikait Singh
Sheryl Hsu
Kyle Hsu
E. Mitchell
Stefano Ermon
Tatsunori Hashimoto
Archit Sharma
Chelsea Finn
    SyDa
    OffRL
ArXivPDFHTML

Papers citing "FSPO: Few-Shot Preference Optimization of Synthetic Preference Data in LLMs Elicits Effective Personalization to Real Users"

17 / 17 papers shown
Title
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
Amulet: Putting Complex Multi-Turn Conversations on the Stand with LLM Juries
Sahana Ramnath
Anurag Mudgil
Brihi Joshi
Skyler Hallinan
Xiang Ren
25
0
0
26 May 2025
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
LoRe: Personalizing LLMs via Low-Rank Reward Modeling
Avinandan Bose
Zhihan Xiong
Yuejie Chi
Simon S. Du
Lin Xiao
Maryam Fazel
53
0
0
20 Apr 2025
Benchmarking Distributional Alignment of Large Language Models
Benchmarking Distributional Alignment of Large Language Models
Nicole Meister
Carlos Guestrin
Tatsunori Hashimoto
ALM
39
4
0
08 Nov 2024
Context Parallelism for Scalable Million-Token Inference
Context Parallelism for Scalable Million-Token Inference
Amy Yang
Jingyi Yang
Aya Ibrahim
Xinfeng Xie
Bangsheng Tang
Grigory Sizov
Jeremy Reizenstein
Jongsoo Park
Jianyu Huang
MoE
LRM
85
6
0
04 Nov 2024
Distributional Preference Alignment of LLMs via Optimal Transport
Distributional Preference Alignment of LLMs via Optimal Transport
Igor Melnyk
Youssef Mroueh
Brian M. Belgodere
Mattia Rigotti
Apoorva Nitsure
Mikhail Yurochkin
Kristjan Greenewald
Jirí Navrátil
Jerret Ross
69
12
0
09 Jun 2024
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Length-Controlled AlpacaEval: A Simple Way to Debias Automatic Evaluators
Yann Dubois
Balázs Galambosi
Percy Liang
Tatsunori Hashimoto
ALM
69
359
0
06 Apr 2024
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for
  Language Models
Synthetic Data (Almost) from Scratch: Generalized Instruction Tuning for Language Models
Haoran Li
Qingxiu Dong
Zhengyang Tang
Chaojun Wang
Xingxing Zhang
...
Wei Lu
Zhifang Sui
Benyou Wang
Wai Lam
Furu Wei
SyDa
65
58
0
20 Feb 2024
Self-Rewarding Language Models
Self-Rewarding Language Models
Weizhe Yuan
Richard Yuanzhe Pang
Kyunghyun Cho
Xian Li
Sainbayar Sukhbaatar
Jing Xu
Jason Weston
ReLM
SyDa
ALM
LRM
282
312
0
18 Jan 2024
Distributional Preference Learning: Understanding and Accounting for
  Hidden Context in RLHF
Distributional Preference Learning: Understanding and Accounting for Hidden Context in RLHF
Anand Siththaranjan
Cassidy Laidlaw
Dylan Hadfield-Menell
72
62
0
13 Dec 2023
A General Theoretical Paradigm to Understand Learning from Human
  Preferences
A General Theoretical Paradigm to Understand Learning from Human Preferences
M. G. Azar
Mark Rowland
Bilal Piot
Daniel Guo
Daniele Calandriello
Michal Valko
Rémi Munos
102
580
0
18 Oct 2023
Reinforced Self-Training (ReST) for Language Modeling
Reinforced Self-Training (ReST) for Language Modeling
Çağlar Gülçehre
T. Paine
S. Srinivasan
Ksenia Konyushkova
L. Weerts
...
Chenjie Gu
Wolfgang Macherey
Arnaud Doucet
Orhan Firat
Nando de Freitas
OffRL
87
291
0
17 Aug 2023
AlpacaFarm: A Simulation Framework for Methods that Learn from Human
  Feedback
AlpacaFarm: A Simulation Framework for Methods that Learn from Human Feedback
Yann Dubois
Xuechen Li
Rohan Taori
Tianyi Zhang
Ishaan Gulrajani
Jimmy Ba
Carlos Guestrin
Percy Liang
Tatsunori B. Hashimoto
ALM
88
569
0
22 May 2023
Language Models are Few-Shot Learners
Language Models are Few-Shot Learners
Tom B. Brown
Benjamin Mann
Nick Ryder
Melanie Subbiah
Jared Kaplan
...
Christopher Berner
Sam McCandlish
Alec Radford
Ilya Sutskever
Dario Amodei
BDL
467
41,106
0
28 May 2020
Meta-Learning without Memorization
Meta-Learning without Memorization
Mingzhang Yin
George Tucker
Mingyuan Zhou
Sergey Levine
Chelsea Finn
VLM
38
186
0
09 Dec 2019
Domain Randomization and Generative Models for Robotic Grasping
Domain Randomization and Generative Models for Robotic Grasping
Joshua Tobin
Lukas Biewald
Rocky Duan
Marcin Andrychowicz
Ankur Handa
...
Bob McGrew
Jonas Schneider
Peter Welinder
Wojciech Zaremba
Pieter Abbeel
OOD
64
175
0
17 Oct 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
206
18,685
0
20 Jul 2017
Learning to reinforcement learn
Learning to reinforcement learn
Jane X. Wang
Z. Kurth-Nelson
Dhruva Tirumala
Hubert Soyer
Joel Z Leibo
Rémi Munos
Charles Blundell
D. Kumaran
M. Botvinick
OffRL
67
974
0
17 Nov 2016
1