ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2202.09662
  4. Cited By
Reward Modeling for Mitigating Toxicity in Transformer-based Language
  Models

Reward Modeling for Mitigating Toxicity in Transformer-based Language Models

19 February 2022
Farshid Faal
K. Schmitt
Jia Yuan Yu
ArXivPDFHTML

Papers citing "Reward Modeling for Mitigating Toxicity in Transformer-based Language Models"

33 / 33 papers shown
Title
Challenges in Detoxifying Language Models
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
298
196
0
15 Sep 2021
DExperts: Decoding-Time Controlled Text Generation with Experts and
  Anti-Experts
DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts
Alisa Liu
Maarten Sap
Ximing Lu
Swabha Swayamdipta
Chandra Bhagavatula
Noah A. Smith
Yejin Choi
MU
102
371
0
07 May 2021
Detoxifying Language Models Risks Marginalizing Minority Voices
Detoxifying Language Models Risks Marginalizing Minority Voices
Albert Xu
Eshaan Pathak
Eric Wallace
Suchin Gururangan
Maarten Sap
Dan Klein
62
128
0
13 Apr 2021
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language
  Generation
BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation
Jwala Dhamala
Tony Sun
Varun Kumar
Satyapriya Krishna
Yada Pruksachatkun
Kai-Wei Chang
Rahul Gupta
83
394
0
27 Jan 2021
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
149
1,197
0
24 Sep 2020
GeDi: Generative Discriminator Guided Sequence Generation
GeDi: Generative Discriminator Guided Sequence Generation
Ben Krause
Akhilesh Deepak Gotmare
Bryan McCann
N. Keskar
Shafiq Joty
R. Socher
Nazneen Rajani
117
406
0
14 Sep 2020
Learning to summarize from human feedback
Learning to summarize from human feedback
Nisan Stiennon
Long Ouyang
Jeff Wu
Daniel M. Ziegler
Ryan J. Lowe
Chelsea Voss
Alec Radford
Dario Amodei
Paul Christiano
ALM
225
2,139
0
02 Sep 2020
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Don't Stop Pretraining: Adapt Language Models to Domains and Tasks
Suchin Gururangan
Ana Marasović
Swabha Swayamdipta
Kyle Lo
Iz Beltagy
Doug Downey
Noah A. Smith
VLM
AI4CE
CLL
152
2,424
0
23 Apr 2020
Plug and Play Language Models: A Simple Approach to Controlled Text
  Generation
Plug and Play Language Models: A Simple Approach to Controlled Text Generation
Sumanth Dathathri
Andrea Madotto
Janice Lan
Jane Hung
Eric Frank
Piero Molino
J. Yosinski
Rosanne Liu
KELM
127
969
0
04 Dec 2019
DialoGPT: Large-Scale Generative Pre-training for Conversational
  Response Generation
DialoGPT: Large-Scale Generative Pre-training for Conversational Response Generation
Yizhe Zhang
Siqi Sun
Michel Galley
Yen-Chun Chen
Chris Brockett
Xiang Gao
Jianfeng Gao
Jingjing Liu
W. Dolan
VLM
169
1,523
0
01 Nov 2019
Exploring the Limits of Transfer Learning with a Unified Text-to-Text
  Transformer
Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer
Colin Raffel
Noam M. Shazeer
Adam Roberts
Katherine Lee
Sharan Narang
Michael Matena
Yanqi Zhou
Wei Li
Peter J. Liu
AIMat
419
20,127
0
23 Oct 2019
Fine-Tuning Language Models from Human Preferences
Fine-Tuning Language Models from Human Preferences
Daniel M. Ziegler
Nisan Stiennon
Jeff Wu
Tom B. Brown
Alec Radford
Dario Amodei
Paul Christiano
G. Irving
ALM
463
1,727
0
18 Sep 2019
The Woman Worked as a Babysitter: On Biases in Language Generation
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
276
642
0
03 Sep 2019
Better Rewards Yield Better Summaries: Learning to Summarise Without
  References
Better Rewards Yield Better Summaries: Learning to Summarise Without References
F. Böhm
Yang Gao
Christian M. Meyer
Ori Shapira
Ido Dagan
Iryna Gurevych
72
107
0
03 Sep 2019
Universal Adversarial Triggers for Attacking and Analyzing NLP
Universal Adversarial Triggers for Attacking and Analyzing NLP
Eric Wallace
Shi Feng
Nikhil Kandpal
Matt Gardner
Sameer Singh
AAML
SILM
112
865
0
20 Aug 2019
Probing Neural Network Comprehension of Natural Language Arguments
Probing Neural Network Comprehension of Natural Language Arguments
Timothy Niven
Hung-Yu kao
AAML
85
454
0
17 Jul 2019
Preference-based Interactive Multi-Document Summarisation
Preference-based Interactive Multi-Document Summarisation
Yang Gao
Christian M. Meyer
Iryna Gurevych
41
27
0
07 Jun 2019
Unified Language Model Pre-training for Natural Language Understanding
  and Generation
Unified Language Model Pre-training for Natural Language Understanding and Generation
Li Dong
Nan Yang
Wenhui Wang
Furu Wei
Xiaodong Liu
Yu Wang
Jianfeng Gao
M. Zhou
H. Hon
ELM
AI4CE
220
1,555
0
08 May 2019
Towards Coherent and Engaging Spoken Dialog Response Generation Using
  Automatic Conversation Evaluators
Towards Coherent and Engaging Spoken Dialog Response Generation Using Automatic Conversation Evaluators
Sanghyun Yi
Rahul Goel
Chandra Khatri
Alessandra Cervone
Tagyoung Chung
Behnam Hedayatnia
Anu Venkatesh
Raefer Gabriel
Dilek Z. Hakkani-Tür
43
60
0
30 Apr 2019
The Curious Case of Neural Text Degeneration
The Curious Case of Neural Text Degeneration
Ari Holtzman
Jan Buys
Li Du
Maxwell Forbes
Yejin Choi
182
3,175
0
22 Apr 2019
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text
  Classification
Nuanced Metrics for Measuring Unintended Bias with Real Data for Text Classification
Daniel Borkan
Lucas Dixon
Jeffrey Scott Sorensen
Nithum Thain
Lucy Vasserman
88
488
0
11 Mar 2019
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural
  Language Inference
Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
R. Thomas McCoy
Ellie Pavlick
Tal Linzen
129
1,237
0
04 Feb 2019
Multi-Task Deep Neural Networks for Natural Language Understanding
Multi-Task Deep Neural Networks for Natural Language Understanding
Xiaodong Liu
Pengcheng He
Weizhu Chen
Jianfeng Gao
AI4CE
121
1,270
0
31 Jan 2019
BERT: Pre-training of Deep Bidirectional Transformers for Language
  Understanding
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
Jacob Devlin
Ming-Wei Chang
Kenton Lee
Kristina Toutanova
VLM
SSL
SSeg
1.7K
94,770
0
11 Oct 2018
Learning to Extract Coherent Summary via Deep Reinforcement Learning
Learning to Extract Coherent Summary via Deep Reinforcement Learning
Yuxiang Wu
Baotian Hu
AI4TS
40
170
0
19 Apr 2018
Reinforcement Learning for Bandit Neural Machine Translation with
  Simulated Human Feedback
Reinforcement Learning for Bandit Neural Machine Translation with Simulated Human Feedback
Khanh Nguyen
Hal Daumé
Jordan L. Boyd-Graber
62
138
0
24 Jul 2017
Proximal Policy Optimization Algorithms
Proximal Policy Optimization Algorithms
John Schulman
Filip Wolski
Prafulla Dhariwal
Alec Radford
Oleg Klimov
OffRL
481
19,019
0
20 Jul 2017
Attention Is All You Need
Attention Is All You Need
Ashish Vaswani
Noam M. Shazeer
Niki Parmar
Jakob Uszkoreit
Llion Jones
Aidan Gomez
Lukasz Kaiser
Illia Polosukhin
3DV
687
131,526
0
12 Jun 2017
A Deep Reinforced Model for Abstractive Summarization
A Deep Reinforced Model for Abstractive Summarization
Romain Paulus
Caiming Xiong
R. Socher
AI4TS
197
1,557
0
11 May 2017
Automated Hate Speech Detection and the Problem of Offensive Language
Automated Hate Speech Detection and the Problem of Offensive Language
Thomas Davidson
Dana Warmsley
M. Macy
Ingmar Weber
76
2,681
0
11 Mar 2017
Deep Reinforcement Learning for Dialogue Generation
Deep Reinforcement Learning for Dialogue Generation
Jiwei Li
Will Monroe
Alan Ritter
Michel Galley
Jianfeng Gao
Dan Jurafsky
278
1,333
0
05 Jun 2016
Sequence Level Training with Recurrent Neural Networks
Sequence Level Training with Recurrent Neural Networks
MarcÁurelio Ranzato
S. Chopra
Michael Auli
Wojciech Zaremba
100
1,615
0
20 Nov 2015
Neural Machine Translation of Rare Words with Subword Units
Neural Machine Translation of Rare Words with Subword Units
Rico Sennrich
Barry Haddow
Alexandra Birch
215
7,735
0
31 Aug 2015
1