Simulating Bandit Learning from User Feedback for Extractive Question Answering

18 March 2022

Papers citing "Simulating Bandit Learning from User Feedback for Extractive Question Answering"

7 / 7 papers shown

Title
Tapered Off-Policy REINFORCE: Stable and efficient reinforcement learning for LLMs Nicolas Le Roux Marc G. Bellemare Jonathan Lebensold Arnaud Bergeron Joshua Greaves Alex Fréchette Carolyne Pelletier Eric Thibodeau-Laufer Sándor Toth Sam Work OffRL 91 4 0 18 Mar 2025
Constructive Large Language Models Alignment with Diverse Feedback Tianshu Yu Ting-En Lin Yuchuan Wu Min Yang Fei Huang Yongbin Li ALM 40 9 0 10 Oct 2023
Let Me Teach You: Pedagogical Foundations of Feedback for Language Models Beatriz Borges Niket Tandon Tanja Kaser Antoine Bosselut 24 4 0 01 Jul 2023
Continually Improving Extractive QA via Human Feedback Ge Gao Hung-Ting Chen Yoav Artzi Eunsol Choi 26 12 0 21 May 2023
Robust Question Answering against Distribution Shifts with Test-Time Adaptation: An Empirical Study Hai Ye Yuyang Ding Juntao Li Hwee Tou Ng OOD TTA 29 9 0 09 Feb 2023
Continual Learning for Instruction Following from Realtime Feedback Alane Suhr Yoav Artzi 29 17 0 19 Dec 2022
Improving a Neural Semantic Parser by Counterfactual Learning from Human Bandit Feedback Carolin (Haas) Lawrence Stefan Riezler OffRL 173 57 0 03 May 2018