Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2203.10079
Cited By
Simulating Bandit Learning from User Feedback for Extractive Question Answering
18 March 2022
Ge Gao
Eunsol Choi
Yoav Artzi
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Simulating Bandit Learning from User Feedback for Extractive Question Answering"
7 / 7 papers shown
Title
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs
Nicolas Le Roux
Marc G. Bellemare
Jonathan Lebensold
Arnaud Bergeron
Joshua Greaves
Alex Fréchette
Carolyne Pelletier
Eric Thibodeau-Laufer
Sándor Toth
Sam Work
OffRL
91
4
0
18 Mar 2025
Constructive Large Language Models Alignment with Diverse Feedback
Tianshu Yu
Ting-En Lin
Yuchuan Wu
Min Yang
Fei Huang
Yongbin Li
ALM
40
9
0
10 Oct 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models
Beatriz Borges
Niket Tandon
Tanja Kaser
Antoine Bosselut
31
4
0
01 Jul 2023
Continually Improving Extractive QA via Human Feedback
Ge Gao
Hung-Ting Chen
Yoav Artzi
Eunsol Choi
28
12
0
21 May 2023
Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study
Hai Ye
Yuyang Ding
Juntao Li
Hwee Tou Ng
OOD
TTA
29
9
0
09 Feb 2023
Continual Learning for Instruction Following from Realtime Feedback
Alane Suhr
Yoav Artzi
31
17
0
19 Dec 2022
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback
Carolin (Haas) Lawrence
Stefan Riezler
OffRL
173
57
0
03 May 2018
1