ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11462
  4. Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
v1v2 (latest)

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
ArXiv (abs)PDFHTML

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 814 papers shown
Title
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent
  Problems in AI Alignment using Large-Language Models
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models
S. Phelps
Rebecca E. Ranson
LLMAG
69
1
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill
  Sets
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
138
108
0
20 Jul 2023
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language
  Models
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models
Somayeh Ghanbarzadeh
Yan-ping Huang
Hamid Palangi
R. C. Moreno
Hamed Khanpour
70
12
0
20 Jul 2023
How is ChatGPT's behavior changing over time?
How is ChatGPT's behavior changing over time?
Lingjiao Chen
Matei A. Zaharia
James Zou
ELMKELMAI4MH
143
433
0
18 Jul 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model
  Chatbots
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
101
138
0
16 Jul 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
95
17
0
14 Jul 2023
Effective Prompt Extraction from Language Models
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACVSILM
109
43
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
261
630
0
12 Jul 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a
  Human-Preference Dataset
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
100
506
0
10 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
154
125
0
06 Jul 2023
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Jonathan Pei
Kevin Kaichuang Yang
Dan Klein
125
21
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELMLM&MAALM
223
1,773
0
06 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
238
1,005
0
05 Jul 2023
Understanding Counterspeech for Online Harm Mitigation
Understanding Counterspeech for Online Harm Mitigation
Yi-Ling Chung
Gavin Abercrombie
Florence E. Enock
Jonathan Bright
Verena Rieser
67
18
0
01 Jul 2023
Queer People are People First: Deconstructing Sexual Identity
  Stereotypes in Large Language Models
Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models
Harnoor Dhingra
Preetiha Jayashanker
Sayali S. Moghe
Emma Strubell
82
13
0
30 Jun 2023
Transformers in Healthcare: A Survey
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedImAI4CE
98
35
0
30 Jun 2023
Stay on topic with Classifier-Free Guidance
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
107
55
0
30 Jun 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple
  Oracles
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Haoqin Tu
Bowen Yang
Xianfeng Zhao
63
6
0
29 Jun 2023
Towards Measuring the Representation of Subjective Global Opinions in
  Language Models
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
149
246
0
28 Jun 2023
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI
  Collaboration for Large Language Models
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Yufei Huang
Deyi Xiong
ALM
134
19
0
28 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language
  Models
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALMELM
102
23
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
127
173
0
22 Jun 2023
Mass-Producing Failures of Multimodal Systems with Language Models
Mass-Producing Failures of Multimodal Systems with Language Models
Shengbang Tong
Erik Jones
Jacob Steinhardt
108
36
0
21 Jun 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language
  Models
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang
Qihui Zhang
Philip S. Y
Lichao Sun
71
54
0
20 Jun 2023
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs
  for Fact-aware Language Modeling
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
Lin F. Yang
Hongyang Chen
Zhao Li
Xiao Ding
Xindong Wu
KELM
116
93
0
20 Jun 2023
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on
  Normative Ethical Theory
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory
Masashi Takeshita
Rafal Rzepka
K. Araki
59
9
0
20 Jun 2023
KEST: Kernel Distance Based Efficient Self-Training for Improving
  Controllable Text Generation
KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation
Yuxi Feng
Xiaoyuan Yi
L. Lakshmanan
Xing Xie
75
1
0
17 Jun 2023
Conformal Language Modeling
Conformal Language Modeling
Victor Quach
Adam Fisch
Tal Schuster
Adam Yala
J. Sohn
Tommi Jaakkola
Regina Barzilay
253
67
0
16 Jun 2023
CHORUS: Foundation Models for Unified Data Discovery and Exploration
CHORUS: Foundation Models for Unified Data Discovery and Exploration
Moe Kayali
A. Lykov
Ilias Fountalis
N. Vasiloglou
Dan Olteanu
Dan Suciu
99
25
0
16 Jun 2023
Evaluating the Social Impact of Generative AI Systems in Systems and
  Society
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman
Zeerak Talat
William Agnew
Lama Ahmad
Dylan K. Baker
...
Marie-Therese Png
Shubham Singh
A. Strait
Lukas Struppek
Arjun Subramonian
ELMEGVM
152
117
0
09 Jun 2023
Prompt Injection attack against LLM-integrated Applications
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Tianwei Zhang
Yepang Liu
Haoyu Wang
Yanhong Zheng
Yang Liu
SILM
128
365
0
08 Jun 2023
Long-form analogies generated by chatGPT lack human-like
  psycholinguistic properties
Long-form analogies generated by chatGPT lack human-like psycholinguistic properties
S. M. Seals
V. Shalin
50
12
0
07 Jun 2023
Click: Controllable Text Generation with Sequence Likelihood Contrastive
  Learning
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning
Chujie Zheng
Pei Ke
Zheng Zhang
Minlie Huang
BDL
83
34
0
06 Jun 2023
AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms
AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms
Zana Buçinca
Chau Minh Pham
Maurice Jakesch
Marco Tulio Ribeiro
Alexandra Olteanu
Saleema Amershi
69
37
0
05 Jun 2023
Structured Voronoi Sampling
Structured Voronoi Sampling
Afra Amini
Li Du
Ryan Cotterell
DiffM
101
2
0
05 Jun 2023
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model
  Pre-Training Research
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Alham Fikri Aji
Genta Indra Winata
Radityo Eko Prasojo
Phil Blunsom
A. Kuncoro
65
8
0
05 Jun 2023
Exposing Bias in Online Communities through Large-Scale Language Models
Exposing Bias in Online Communities through Large-Scale Language Models
Celine Wald
Lukas Pfahler
67
6
0
04 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model
  Training
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
168
336
0
02 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute
  Controlled Generation
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
87
5
0
01 Jun 2023
Thought Cloning: Learning to Think while Acting by Imitating Human
  Thinking
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu
Jeff Clune
LM&RoOffRLLRMAI4CE
89
29
0
01 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation
An Invariant Learning Characterization of Controlled Text Generation
Carolina Zheng
Claudia Shi
Keyon Vafa
Amir Feder
David M. Blei
OOD
103
8
0
31 May 2023
Controlled Text Generation with Hidden Representation Transformations
Controlled Text Generation with Hidden Representation Transformations
Vaibhav Kumar
H. Koorehdavoudi
Masud Moshtaghi
Amita Misra
Ankit Chadha
Emilio Ferrara
62
3
0
30 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large
  Language Model Application
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
106
30
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable
  Responses Created Through Human-Machine Collaboration
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice Oh
San-hee Park
Jung-Woo Ha
98
18
0
28 May 2023
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
91
25
0
27 May 2023
Generating Images with Multimodal Language Models
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
164
259
0
26 May 2023
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language
  Models
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Julia Mendelsohn
Ronan Le Bras
Yejin Choi
Maarten Sap
61
29
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social
  Interactions
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
80
56
0
26 May 2023
The False Promise of Imitating Proprietary LLMs
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
144
208
0
25 May 2023
Scaling Data-Constrained Language Models
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
196
226
0
25 May 2023
Previous
123...101112...151617
Next