ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1805.10627
  4. Cited By
Reliability and Learnability of Human Bandit Feedback for
  Sequence-to-Sequence Reinforcement Learning

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning

27 May 2018
Julia Kreutzer
Joshua Uyheng
Stefan Riezler
ArXivPDFHTML

Papers citing "Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning"

21 / 21 papers shown
Title
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
ADHMR: Aligning Diffusion-based Human Mesh Recovery via Direct Preference Optimization
Wenhao Shen
Wanqi Yin
Xiaofeng Yang
Cheng Chen
Chaoyue Song
Zhongang Cai
Lei Yang
Hao Wang
Guosheng Lin
30
0
0
15 May 2025
Post-edits Are Preferences Too
Post-edits Are Preferences Too
Nathaniel Berger
Stefan Riezler
M. Exel
Matthias Huck
37
0
0
24 Feb 2025
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
31
1
0
08 Nov 2024
Your Weak LLM is Secretly a Strong Teacher for Alignment
Your Weak LLM is Secretly a Strong Teacher for Alignment
Leitian Tao
Yixuan Li
88
5
0
13 Sep 2024
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Advancing Tool-Augmented Large Language Models: Integrating Insights from Errors in Inference Trees
Sijia Chen
Yibo Wang
Yi-Feng Wu
Qing-Guo Chen
Zhao Xu
Weihua Luo
Kaifu Zhang
Lijun Zhang
LLMAG
LRM
50
10
0
11 Jun 2024
Improving Socratic Question Generation using Data Augmentation and
  Preference Optimization
Improving Socratic Question Generation using Data Augmentation and Preference Optimization
Nischal Ashok Kumar
Andrew S. Lan
33
8
0
01 Mar 2024
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with
  Human Feedback in Large Language Models
RLHFPoison: Reward Poisoning Attack for Reinforcement Learning with Human Feedback in Large Language Models
Jiong Wang
Junlin Wu
Muhao Chen
Yevgeniy Vorobeychik
Chaowei Xiao
AAML
26
12
0
16 Nov 2023
LitSumm: Large language models for literature summarisation of non-coding RNAs
LitSumm: Large language models for literature summarisation of non-coding RNAs
Andrew Green
C. Ribas
Nancy Ontiveros-Palacios
Sam Griffiths-Jones
Anton I. Petrov
Alex Bateman
Blake Sweeney
24
4
0
06 Nov 2023
Continually Improving Extractive QA via Human Feedback
Continually Improving Extractive QA via Human Feedback
Ge Gao
Hung-Ting Chen
Yoav Artzi
Eunsol Choi
26
12
0
21 May 2023
Consistency is Key: Disentangling Label Variation in Natural Language
  Processing with Intra-Annotator Agreement
Consistency is Key: Disentangling Label Variation in Natural Language Processing with Intra-Annotator Agreement
Gavin Abercrombie
Verena Rieser
Dirk Hovy
54
16
0
25 Jan 2023
Continual Learning for Instruction Following from Realtime Feedback
Continual Learning for Instruction Following from Realtime Feedback
Alane Suhr
Yoav Artzi
23
17
0
19 Dec 2022
Mapping the Design Space of Human-AI Interaction in Text Summarization
Mapping the Design Space of Human-AI Interaction in Text Summarization
Ruijia Cheng
Alison Smith-Renner
Kecheng Zhang
Joel R. Tetreault
A. Jaimes
41
31
0
29 Jun 2022
Why is constrained neural language generation particularly challenging?
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
Continual Learning for Grounded Instruction Generation by Observing
  Human Following Behavior
Continual Learning for Grounded Instruction Generation by Observing Human Following Behavior
Noriyuki Kojima
Alane Suhr
Yoav Artzi
25
24
0
10 Aug 2021
Interactive Learning from Activity Description
Interactive Learning from Activity Description
Khanh Nguyen
Dipendra Kumar Misra
Robert Schapire
Miroslav Dudík
Patrick Shafto
47
34
0
13 Feb 2021
Open Problems in Cooperative AI
Open Problems in Cooperative AI
Allan Dafoe
Edward Hughes
Yoram Bachrach
Tantum Collins
Kevin R. McKee
Joel Z Leibo
Kate Larson
T. Graepel
28
199
0
15 Dec 2020
Informed Machine Learning -- A Taxonomy and Survey of Integrating
  Knowledge into Learning Systems
Informed Machine Learning -- A Taxonomy and Survey of Integrating Knowledge into Learning Systems
Laura von Rueden
S. Mayer
Katharina Beckh
B. Georgiev
Sven Giesselbach
...
Rajkumar Ramamurthy
Michal Walczak
Jochen Garcke
Christian Bauckhage
Jannis Schuecker
34
626
0
29 Mar 2019
Scalable agent alignment via reward modeling: a research direction
Scalable agent alignment via reward modeling: a research direction
Jan Leike
David M. Krueger
Tom Everitt
Miljan Martic
Vishal Maini
Shane Legg
34
395
0
19 Nov 2018
Can Neural Machine Translation be Improved with User Feedback?
Can Neural Machine Translation be Improved with User Feedback?
Julia Kreutzer
Shahram Khadivi
E. Matusov
Stefan Riezler
14
93
0
16 Apr 2018
Google's Neural Machine Translation System: Bridging the Gap between
  Human and Machine Translation
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,746
0
26 Sep 2016
Convolutional Neural Networks for Sentence Classification
Convolutional Neural Networks for Sentence Classification
Yoon Kim
AILaw
VLM
255
13,364
0
25 Aug 2014
1