ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1712.02838
  4. Cited By
End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy
  Gradient

End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient

7 December 2017
Li Zhou
Kevin Small
Oleg Rokhlenko
Charles Elkan
    OffRL
ArXivPDFHTML

Papers citing "End-to-End Offline Goal-Oriented Dialog Policy Learning via Policy Gradient"

8 / 8 papers shown
Title
Prompt-Based Length Controlled Generation with Reinforcement Learning
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
24
8
0
23 Aug 2023
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
  with Reinforced Keywords Learning
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning
Xiao Yu
Qingyang Wu
Kun Qian
Zhou Yu
OffRL
21
11
0
30 Nov 2022
Jointly Reinforced User Simulator and Task-oriented Dialog System with
  Simplified Generative Architecture
Jointly Reinforced User Simulator and Task-oriented Dialog System with Simplified Generative Architecture
Abhishek Sethi
Zhijian Ou
Yi Huang
Junlan Feng
RALM
21
1
0
13 Oct 2022
Is Reinforcement Learning (Not) for Natural Language Processing:
  Benchmarks, Baselines, and Building Blocks for Natural Language Policy
  Optimization
Is Reinforcement Learning (Not) for Natural Language Processing: Benchmarks, Baselines, and Building Blocks for Natural Language Policy Optimization
Rajkumar Ramamurthy
Prithviraj Ammanabrolu
Kianté Brantley
Jack Hessel
R. Sifa
Christian Bauckhage
Hannaneh Hajishirzi
Yejin Choi
OffRL
31
240
0
03 Oct 2022
Reinforcement Learning of Multi-Domain Dialog Policies Via Action
  Embeddings
Reinforcement Learning of Multi-Domain Dialog Policies Via Action Embeddings
Jorge Armando Mendez Mendez
Alborz Geramifard
Mohammad Ghavamzadeh
Bing-Quan Liu
OffRL
27
6
0
01 Jul 2022
Behavioral Priors and Dynamics Models: Improving Performance and Domain
  Transfer in Offline RL
Behavioral Priors and Dynamics Models: Improving Performance and Domain Transfer in Offline RL
Catherine Cang
Aravind Rajeswaran
Pieter Abbeel
Michael Laskin
OffRL
32
29
0
16 Jun 2021
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
220
1,328
0
05 Jun 2016
Off-Policy Actor-Critic
Off-Policy Actor-Critic
T. Degris
Martha White
R. Sutton
OffRL
CML
163
220
0
22 May 2012
1