Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
1607.07086
Cited By
An Actor-Critic Algorithm for Sequence Prediction
24 July 2016
Dzmitry Bahdanau
Philemon Brakel
Kelvin Xu
Anirudh Goyal
Ryan J. Lowe
Joelle Pineau
Aaron Courville
Yoshua Bengio
Re-assign community
ArXiv
PDF
HTML
Papers citing
"An Actor-Critic Algorithm for Sequence Prediction"
50 / 362 papers shown
Title
BLEUBERI: BLEU is a surprisingly effective reward for instruction following
Yapei Chang
Yekyung Kim
Michael Krumdick
Amir Zadeh
Chuan Li
Chris Tanner
Mohit Iyyer
ALM
22
0
0
16 May 2025
Reinforcement learning framework for the mechanical design of microelectronic components under multiphysics constraints
S. Nair
Timothy F. Walsh
Greg Pickrell
Fabio Semperlotti
30
0
0
23 Apr 2025
SimpleAR: Pushing the Frontier of Autoregressive Visual Generation through Pretraining, SFT, and RL
Junke Wang
Zhi Tian
Xinyu Wang
Xinyu Zhang
Weilin Huang
Zuxuan Wu
Yu Jiang
VGen
49
6
0
15 Apr 2025
Deep Reasoning Translation via Reinforcement Learning
Jiaan Wang
Fandong Meng
Jie Zhou
OffRL
LRM
33
0
0
14 Apr 2025
Reference-free Evaluation Metrics for Text Generation: A Survey
Takumi Ito
Kees van Deemter
Jun Suzuki
ELM
41
2
0
21 Jan 2025
Fine-Grained Reward Optimization for Machine Translation using Error Severity Mappings
Miguel Moura Ramos
Tomás Almeida
Daniel Vareta
Filipe Azevedo
Sweta Agrawal
Patrick Fernandes
André F. T. Martins
31
1
0
08 Nov 2024
PrefPaint: Aligning Image Inpainting Diffusion Model with Human Preference
Kendong Liu
Zhiyu Zhu
Chuanhao Li
Hui Liu
H. Zeng
Junhui Hou
EGVM
43
2
0
29 Oct 2024
Diverse Sign Language Translation
Xin Shen
Lei Shen
Shaozu Yuan
Heming Du
Haiyang Sun
Xin Yu
SLR
43
1
0
25 Oct 2024
When Molecular GAN Meets Byte-Pair Encoding
Huidong Tang
Chen Li
Yasuhiko Morimoto
27
0
0
29 Sep 2024
Mitigating the Negative Impact of Over-association for Conversational Query Production
Ante Wang
Linfeng Song
Zijun Min
Ge Xu
Xiaoli Wang
Junfeng Yao
Jinsong Su
33
0
0
29 Sep 2024
ChronoGAN: Supervised and Embedded Generative Adversarial Networks for Time Series Generation
MohammadReza EskandariNasab
S. M. Hamdi
S. F. Boubrahimi
GAN
AI4TS
28
1
0
21 Sep 2024
LLMR: Knowledge Distillation with a Large Language Model-Induced Reward
Dongheng Li
Yongchang Hao
Lili Mou
53
1
0
19 Sep 2024
A Gradient Analysis Framework for Rewarding Good and Penalizing Bad Examples in Language Models
Yi-Lin Tuan
William Yang Wang
29
1
0
29 Aug 2024
RePair: Automated Program Repair with Process-based Feedback
Yuze Zhao
Zhenya Huang
Yixiao Ma
Rui Li
Kai Zhang
Hao Jiang
Qi Liu
Linbo Zhu
Yu Su
KELM
39
6
0
21 Aug 2024
LLM See, LLM Do: Guiding Data Generation to Target Non-Differentiable Objectives
Luísa Shimabucoro
Sebastian Ruder
Julia Kreutzer
Marzieh Fadaee
Sara Hooker
SyDa
36
4
0
01 Jul 2024
Combining Supervised Learning and Reinforcement Learning for Multi-Label Classification Tasks with Partial Labels
Zixia Jia
Junpeng Li
Shichuan Zhang
Guy Van den Broeck
Zilong Zheng
40
2
0
24 Jun 2024
INDICT: Code Generation with Internal Dialogues of Critiques for Both Security and Helpfulness
Hung Le
Yingbo Zhou
Caiming Xiong
Silvio Savarese
Doyen Sahoo
52
2
0
23 Jun 2024
Prompt-Based Length Controlled Generation with Multiple Control Types
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
26
6
0
12 Jun 2024
Beyond MLE: Investigating SEARNN for Low-Resourced Neural Machine Translation
Chris C. Emezue
22
0
0
20 May 2024
A Mixture-of-Experts Approach to Few-Shot Task Transfer in Open-Ended Text Worlds
Christopher Cui
Xiangyu Peng
Mark O. Riedl
LLMAG
OffRL
MoE
33
1
0
09 May 2024
A Network Simulation of OTC Markets with Multiple Agents
James T. Wilkinson
Jacob Kelter
John Chen
Uri Wilensky
AIFin
22
0
0
03 May 2024
Reinforcement Learning-Guided Semi-Supervised Learning
Marzi Heidari
Hanping Zhang
Yuhong Guo
OffRL
39
0
0
02 May 2024
Reinforcement Learning for Edit-Based Non-Autoregressive Neural Machine Translation
Hao Wang
Tetsuro Morimura
Ukyo Honda
Daisuke Kawahara
19
0
0
02 May 2024
MetaRM: Shifted Distributions Alignment via Meta-Learning
Shihan Dou
Yan Liu
Enyu Zhou
Tianlong Li
Haoxiang Jia
...
Junjie Ye
Rui Zheng
Tao Gui
Qi Zhang
Xuanjing Huang
OOD
57
2
0
01 May 2024
Efficient Sample-Specific Encoder Perturbations
Yassir Fathullah
Mark J. F. Gales
26
0
0
01 May 2024
Countering Reward Over-optimization in LLM with Demonstration-Guided Reinforcement Learning
Mathieu Rita
Florian Strub
Rahma Chaabouni
Paul Michel
Emmanuel Dupoux
Olivier Pietquin
42
8
0
30 Apr 2024
Beyond the Edge: An Advanced Exploration of Reinforcement Learning for Mobile Edge Computing, its Applications, and Future Research Trajectories
Ning Yang
Shuo Chen
Haijun Zhang
Randall Berry
OffRL
29
6
0
22 Apr 2024
RLHF Deciphered: A Critical Analysis of Reinforcement Learning from Human Feedback for LLMs
Shreyas Chaudhari
Pranjal Aggarwal
Vishvak Murahari
Tanmay Rajpurohit
A. Kalyan
Karthik Narasimhan
A. Deshpande
Bruno Castro da Silva
29
34
0
12 Apr 2024
Polarity Calibration for Opinion Summarization
Yuanyuan Lei
Kaiqiang Song
Sangwoo Cho
Xiaoyang Wang
Ruihong Huang
Dong Yu
35
0
0
02 Apr 2024
The pitfalls of next-token prediction
Gregor Bachmann
Vaishnavh Nagarajan
37
62
0
11 Mar 2024
Attention-based Reinforcement Learning for Combinatorial Optimization: Application to Job Shop Scheduling Problem
Jaejin Lee
Seho Kee
Mani Janakiram
George Runger
OffRL
27
3
0
29 Jan 2024
Beyond Sparse Rewards: Enhancing Reinforcement Learning with Language Model Critique in Text Generation
Meng Cao
Lei Shu
Lei Yu
Yun Zhu
Nevan Wichers
Yinxiao Liu
Lei Meng
OffRL
ALM
27
4
0
14 Jan 2024
An Invitation to Deep Reinforcement Learning
Bernhard Jaeger
Andreas Geiger
OffRL
OOD
78
5
0
13 Dec 2023
Successor Features for Efficient Multisubject Controlled Text Generation
Mengyao Cao
Mehdi Fatemi
Jackie Chi Kit Cheung
Samira Shabanian
BDL
37
0
0
03 Nov 2023
Time-series Generation by Contrastive Imitation
Daniel Jarrett
Ioana Bica
M. Schaar
AI4TS
13
24
0
02 Nov 2023
Vanishing Gradients in Reinforcement Finetuning of Language Models
Noam Razin
Hattie Zhou
Omid Saremi
Vimal Thilak
Arwen Bradley
Preetum Nakkiran
Josh Susskind
Etai Littwin
18
7
0
31 Oct 2023
Beyond MLE: Convex Learning for Text Generation
Chenze Shao
Zhengrui Ma
Min Zhang
Yang Feng
22
3
0
26 Oct 2023
Retrieval-based Knowledge Transfer: An Effective Approach for Extreme Large Language Model Compression
Jiduan Liu
Jiahao Liu
Qifan Wang
Jingang Wang
Xunliang Cai
Dongyan Zhao
R. Wang
Rui Yan
24
4
0
24 Oct 2023
Loose lips sink ships: Mitigating Length Bias in Reinforcement Learning from Human Feedback
Wei Shen
Rui Zheng
Wenyu Zhan
Jun Zhao
Shihan Dou
Tao Gui
Qi Zhang
Xuanjing Huang
ALM
42
42
0
08 Oct 2023
Unlikelihood Tuning on Negative Samples Amazingly Improves Zero-Shot Translation
Junjie Yang
Liang Ding
Li Shen
Matthieu Labeau
Yibing Zhan
Weifeng Liu
Dacheng Tao
VLM
33
4
0
28 Sep 2023
Prompt-Based Length Controlled Generation with Reinforcement Learning
Renlong Jie
Xiaojun Meng
Lifeng Shang
Xin Jiang
Qun Liu
17
8
0
23 Aug 2023
Mitigating the Exposure Bias in Sentence-Level Grapheme-to-Phoneme (G2P) Transduction
Eunseop Yoon
Hee Suk Yoon
Dhananjaya N. Gowda
Soohwan Eom
Daehyeok Kim
John Harvill
Heting Gao
M. Hasegawa-Johnson
Chanwoo Kim
Chang D. Yoo
32
1
0
16 Aug 2023
O-1: Self-training with Oracle and 1-best Hypothesis
M. Baskar
Andrew Rosenberg
Bhuvana Ramabhadran
Kartik Audhkhasi
VLM
22
0
0
14 Aug 2023
Thespian: Multi-Character Text Role-Playing Game Agents
Christopher Cui
Xiangyu Peng
Mark O. Riedl
LLMAG
AI4CE
25
4
0
03 Aug 2023
Reinforcement Learning for Generative AI: State of the Art, Opportunities and Open Research Challenges
Giorgio Franceschelli
Mirco Musolesi
AI4CE
40
20
0
31 Jul 2023
DRL4Route: A Deep Reinforcement Learning Framework for Pick-up and Delivery Route Prediction
Xiaowei Mao
Haomin Wen
Hengrui Zhang
Huaiyu Wan
Lixia Wu
Jianbin Zheng
Haoyuan Hu
Youfang Lin
AI4TS
49
11
0
30 Jul 2023
FigCaps-HF: A Figure-to-Caption Generative Framework and Benchmark with Human Feedback
Ashish Singh
Prateek R. Agarwal
Zixuan Huang
Arpita Singh
Tong Yu
Sungchul Kim
Victor S. Bursztyn
N. Vlassis
Ryan A. Rossi
36
6
0
20 Jul 2023
Improving End-to-End Speech Translation by Imitation-Based Knowledge Distillation with Synthetic Transcripts
Rebekka Hubert
Artem Sokolov
Stefan Riezler
24
1
0
17 Jul 2023
Enhancing Supervised Learning with Contrastive Markings in Neural Machine Translation Training
Nathaniel Berger
M. Exel
Matthias Huck
Stefan Riezler
23
2
0
17 Jul 2023
SARC: Soft Actor Retrospective Critic
Sukriti Verma
Ayush Chopra
J. Subramanian
Mausoom Sarkar
Nikaash Puri
Piyush B. Gupta
Balaji Krishnamurthy
10
0
0
28 Jun 2023
1
2
3
4
5
6
7
8
Next