ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 1607.07086
  4. Cited By
An Actor-Critic Algorithm for Sequence Prediction

An Actor-Critic Algorithm for Sequence Prediction

24 July 2016
Dzmitry Bahdanau
Philemon Brakel
Kelvin Xu
Anirudh Goyal
Ryan J. Lowe
Joelle Pineau
Aaron Courville
Yoshua Bengio
ArXivPDFHTML

Papers citing "An Actor-Critic Algorithm for Sequence Prediction"

50 / 362 papers shown
Title
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
S. Nair
Timothy F. Walsh
Greg Pickrell
Fabio Semperlotti
30
0
0
23 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
49
6
0
15 Apr 2025
Deep Reasoning Translation via Reinforcement Learning
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
33
0
0
14 Apr 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
31
1
0
08 Nov 2024
PrefPaint: Aligning Image Inpainting Diffusion Model with Human
  Preference
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference
Kendong Liu
Zhiyu Zhu
Chuanhao Li
Hui Liu
H. Zeng
Junhui Hou
EGVM
43
2
0
29 Oct 2024
Diverse Sign Language Translation
Diverse Sign Language Translation
Xin Shen
Lei Shen
Shaozu Yuan
Heming Du
Haiyang Sun
Xin Yu
SLR
43
1
0
25 Oct 2024
When Molecular GAN Meets Byte-Pair Encoding
When Molecular GAN Meets Byte-Pair Encoding
Huidong Tang
Chen Li
Yasuhiko Morimoto
27
0
0
29 Sep 2024
Mitigating the Negative Impact of Over-association for Conversational
  Query Production
Mitigating the Negative Impact of Over-association for Conversational Query Production
Ante Wang
Linfeng Song
Zijun Min
Ge Xu
Xiaoli Wang
Junfeng Yao
Jinsong Su
33
0
0
29 Sep 2024
ChronoGAN: Supervised and Embedded Generative Adversarial Networks for
  Time Series Generation
ChronoGAN: Supervised and Embedded Generative Adversarial Networks for Time Series Generation
MohammadReza EskandariNasab
S. M. Hamdi
S. F. Boubrahimi
GAN
AI4TS
28
1
0
21 Sep 2024
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward
Dongheng Li
Yongchang Hao
Lili Mou
53
1
0
19 Sep 2024
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad
  Examples in Language Models
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models
Yi-Lin Tuan
William Yang Wang
29
1
0
29 Aug 2024
RePair: Automated Program Repair with Process-based Feedback
RePair: Automated Program Repair with Process-based Feedback
Yuze Zhao
Zhenya Huang
Yixiao Ma
Rui Li
Kai Zhang
Hao Jiang
Qi Liu
Linbo Zhu
Yu Su
KELM
39
6
0
21 Aug 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable
  Objectives
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
Luísa Shimabucoro
Sebastian Ruder
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
SyDa
36
4
0
01 Jul 2024
Combining Supervised Learning and Reinforcement Learning for Multi-Label
  Classification Tasks with Partial Labels
Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels
Zixia Jia
Junpeng Li
Shichuan Zhang
Guy Van den Broeck
Zilong Zheng
40
2
0
24 Jun 2024
INDICT: Code Generation with Internal Dialogues of Critiques for Both
  Security and Helpfulness
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
Hung Le
Yingbo Zhou
Caiming Xiong
Silvio Savarese
Doyen Sahoo
52
2
0
23 Jun 2024
Prompt-Based Length Controlled Generation with Multiple Control Types
Prompt-Based Length Controlled Generation with Multiple Control Types
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
26
6
0
12 Jun 2024
Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine
  Translation
Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation
Chris C. Emezue
22
0
0
20 May 2024
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended
  Text Worlds
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds
Christopher Cui
Xiangyu Peng
Mark O. Riedl
LLMAG
OffRL
MoE
33
1
0
09 May 2024
A Network Simulation of OTC Markets with Multiple Agents
A Network Simulation of OTC Markets with Multiple Agents
James T. Wilkinson
Jacob Kelter
John Chen
Uri Wilensky
AIFin
22
0
0
03 May 2024
Reinforcement Learning-Guided Semi-Supervised Learning
Reinforcement Learning-Guided Semi-Supervised Learning
Marzi Heidari
Hanping Zhang
Yuhong Guo
OffRL
39
0
0
02 May 2024
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine
  Translation
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Hao Wang
Tetsuro Morimura
Ukyo Honda
Daisuke Kawahara
19
0
0
02 May 2024
MetaRM: Shifted Distributions Alignment via Meta-Learning
MetaRM: Shifted Distributions Alignment via Meta-Learning
Shihan Dou
Yan Liu
Enyu Zhou
Tianlong Li
Haoxiang Jia
...
Junjie Ye
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OOD
57
2
0
01 May 2024
Efficient Sample-Specific Encoder Perturbations
Efficient Sample-Specific Encoder Perturbations
Yassir Fathullah
Mark J. F. Gales
26
0
0
01 May 2024
Countering Reward Over-optimization in LLM with Demonstration-Guided
  Reinforcement Learning
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
42
8
0
30 Apr 2024
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for
  Mobile Edge Computing, its Applications, and Future Research Trajectories
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories
Ning Yang
Shuo Chen
Haijun Zhang
Randall Berry
OffRL
29
6
0
22 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from
  Human Feedback for LLMs
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
A. Kalyan
Karthik Narasimhan
A. Deshpande
Bruno Castro da Silva
29
34
0
12 Apr 2024
Polarity Calibration for Opinion Summarization
Polarity Calibration for Opinion Summarization
Yuanyuan Lei
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
Ruihong Huang
Dong Yu
35
0
0
02 Apr 2024
The pitfalls of next-token prediction
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
37
62
0
11 Mar 2024
Attention-based Reinforcement Learning for Combinatorial Optimization:
  Application to Job Shop Scheduling Problem
Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem
Jaejin Lee
Seho Kee
Mani Janakiram
George Runger
OffRL
27
3
0
29 Jan 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language
  Model Critique in Text Generation
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRL
ALM
27
4
0
14 Jan 2024
An Invitation to Deep Reinforcement Learning
An Invitation to Deep Reinforcement Learning
Bernhard Jaeger
Andreas Geiger
OffRL
OOD
78
5
0
13 Dec 2023
Successor Features for Efficient Multisubject Controlled Text Generation
Successor Features for Efficient Multisubject Controlled Text Generation
Mengyao Cao
Mehdi Fatemi
Jackie Chi Kit Cheung
Samira Shabanian
BDL
37
0
0
03 Nov 2023
Time-series Generation by Contrastive Imitation
Time-series Generation by Contrastive Imitation
Daniel Jarrett
Ioana Bica
M. Schaar
AI4TS
13
24
0
02 Nov 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
18
7
0
31 Oct 2023
Beyond MLE: Convex Learning for Text Generation
Beyond MLE: Convex Learning for Text Generation
Chenze Shao
Zhengrui Ma
Min Zhang
Yang Feng
22
3
0
26 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme
  Large Language Model Compression
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
R. Wang
Rui Yan
24
4
0
24 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning
  from Human Feedback
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
42
42
0
08 Oct 2023
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot
  Translation
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Junjie Yang
Liang Ding
Li Shen
Matthieu Labeau
Yibing Zhan
Weifeng Liu
Dacheng Tao
VLM
33
4
0
28 Sep 2023
Prompt-Based Length Controlled Generation with Reinforcement Learning
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
17
8
0
23 Aug 2023
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P)
  Transduction
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Eunseop Yoon
Hee Suk Yoon
Dhananjaya N. Gowda
Soohwan Eom
Daehyeok Kim
John Harvill
Heting Gao
M. Hasegawa-Johnson
Chanwoo Kim
Chang D. Yoo
32
1
0
16 Aug 2023
O-1: Self-training with Oracle and 1-best Hypothesis
O-1: Self-training with Oracle and 1-best Hypothesis
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Kartik Audhkhasi
VLM
22
0
0
14 Aug 2023
Thespian: Multi-Character Text Role-Playing Game Agents
Thespian: Multi-Character Text Role-Playing Game Agents
Christopher Cui
Xiangyu Peng
Mark O. Riedl
LLMAG
AI4CE
25
4
0
03 Aug 2023
Reinforcement Learning for Generative AI: State of the Art,
  Opportunities and Open Research Challenges
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Giorgio Franceschelli
Mirco Musolesi
AI4CE
40
20
0
31 Jul 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and
  Delivery Route Prediction
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction
Xiaowei Mao
Haomin Wen
Hengrui Zhang
Huaiyu Wan
Lixia Wu
Jianbin Zheng
Haoyuan Hu
Youfang Lin
AI4TS
49
11
0
30 Jul 2023
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with
  Human Feedback
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
Ashish Singh
Prateek R. Agarwal
Zixuan Huang
Arpita Singh
Tong Yu
Sungchul Kim
Victor S. Bursztyn
N. Vlassis
Ryan A. Rossi
36
6
0
20 Jul 2023
Improving End-to-End Speech Translation by Imitation-Based Knowledge
  Distillation with Synthetic Transcripts
Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
Rebekka Hubert
Artem Sokolov
Stefan Riezler
24
1
0
17 Jul 2023
Enhancing Supervised Learning with Contrastive Markings in Neural
  Machine Translation Training
Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training
Nathaniel Berger
M. Exel
Matthias Huck
Stefan Riezler
23
2
0
17 Jul 2023
SARC: Soft Actor Retrospective Critic
SARC: Soft Actor Retrospective Critic
Sukriti Verma
Ayush Chopra
J. Subramanian
Mausoom Sarkar
Nikaash Puri
Piyush B. Gupta
Balaji Krishnamurthy
10
0
0
28 Jun 2023
12345678
Next