Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2011.07635
Cited By
DORB: Dynamically Optimizing Multiple Rewards with Bandits
15 November 2020
Ramakanth Pasunuru
Han Guo
Mohit Bansal
OffRL
Re-assign community
ArXiv
PDF
HTML
Papers citing
"DORB: Dynamically Optimizing Multiple Rewards with Bandits"
8 / 8 papers shown
Title
LASeR: Learning to Adaptively Select Reward Models with Multi-Armed Bandits
Duy Nguyen
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
23
3
0
02 Oct 2024
Dynamic Reward Adjustment in Multi-Reward Reinforcement Learning for Counselor Reflection Generation
Do June Min
Verónica Pérez-Rosas
Kenneth Resnicow
Rada Mihalcea
OffRL
48
2
0
20 Mar 2024
Improving Few-Shot Generalization by Exploring and Exploiting Auxiliary Data
Alon Albalak
Colin Raffel
William Yang Wang
19
12
0
01 Feb 2023
Why is constrained neural language generation particularly challenging?
Cristina Garbacea
Qiaozhu Mei
59
14
0
11 Jun 2022
Recent Advances in Neural Text Generation: A Task-Agnostic Survey
Chen Tang
Frank Guerin
Chenghua Lin
AI4CE
OOD
28
19
0
06 Mar 2022
Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation
Yonghui Wu
M. Schuster
Z. Chen
Quoc V. Le
Mohammad Norouzi
...
Alex Rudnick
Oriol Vinyals
G. Corrado
Macduff Hughes
J. Dean
AIMat
716
6,743
0
26 Sep 2016
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
214
1,327
0
05 Jun 2016
Effective Approaches to Attention-based Neural Machine Translation
Thang Luong
Hieu H. Pham
Christopher D. Manning
218
7,923
0
17 Aug 2015
1