ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11462
  4. Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
ArXivPDFHTML

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 772 papers shown
Title
Diffusion Models for Non-autoregressive Text Generation: A Survey
Diffusion Models for Non-autoregressive Text Generation: A Survey
Yifan Li
Kun Zhou
Wayne Xin Zhao
Ji-Rong Wen
MedIm
DiffM
50
33
0
12 Mar 2023
Who's Thinking? A Push for Human-Centered Evaluation of LLMs using the
  XAI Playbook
Who's Thinking? A Push for Human-Centered Evaluation of LLMs using the XAI Playbook
Teresa Datta
John P. Dickerson
36
10
0
10 Mar 2023
Personalisation within bounds: A risk taxonomy and policy framework for
  the alignment of large language models with personalised feedback
Personalisation within bounds: A risk taxonomy and policy framework for the alignment of large language models with personalised feedback
Hannah Rose Kirk
Bertie Vidgen
Paul Röttger
Scott A. Hale
46
100
0
09 Mar 2023
disco: a toolkit for Distributional Control of Generative Models
disco: a toolkit for Distributional Control of Generative Models
Germán Kruszewski
Jos Rozen
Marc Dymetman
32
4
0
08 Mar 2023
Extending the Pre-Training of BLOOM for Improved Support of Traditional
  Chinese: Models, Methods and Results
Extending the Pre-Training of BLOOM for Improved Support of Traditional Chinese: Models, Methods and Results
Philipp Ennen
Po-Chun Hsu
Chan-Jan Hsu
Chang-Le Liu
Yen-Chen Wu
Yin-Hsiang Liao
Chin-Tung Lin
Da-Shan Shiu
Wei-Yun Ma
OSLM
VLM
AI4CE
46
10
0
08 Mar 2023
Extrapolative Controlled Sequence Generation via Iterative Refinement
Extrapolative Controlled Sequence Generation via Iterative Refinement
Vishakh Padmakumar
Richard Yuanzhe Pang
He He
Ankur P. Parikh
31
9
0
08 Mar 2023
Automatically Auditing Large Language Models via Discrete Optimization
Automatically Auditing Large Language Models via Discrete Optimization
Erik Jones
Anca Dragan
Aditi Raghunathan
Jacob Steinhardt
48
159
0
08 Mar 2023
Data Portraits: Recording Foundation Model Training Data
Data Portraits: Recording Foundation Model Training Data
Marc Marone
Benjamin Van Durme
143
30
0
06 Mar 2023
Interactive Text Generation
Interactive Text Generation
Felix Faltings
Michel Galley
Baolin Peng
Kianté Brantley
Weixin Cai
Yizhe Zhang
Jianfeng Gao
Bill Dolan
33
0
0
02 Mar 2023
Systematic Rectification of Language Models via Dead-end Analysis
Systematic Rectification of Language Models via Dead-end Analysis
Mengyao Cao
Mehdi Fatemi
Jackie C.K. Cheung
Samira Shabanian
KELM
32
16
0
27 Feb 2023
LLaMA: Open and Efficient Foundation Language Models
LLaMA: Open and Efficient Foundation Language Models
Hugo Touvron
Thibaut Lavril
Gautier Izacard
Xavier Martinet
Marie-Anne Lachaux
...
Faisal Azhar
Aurelien Rodriguez
Armand Joulin
Edouard Grave
Guillaume Lample
ALM
PILM
100
12,418
0
27 Feb 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
65
443
0
23 Feb 2023
Towards Safer Generative Language Models: A Survey on Safety Risks,
  Evaluations, and Improvements
Towards Safer Generative Language Models: A Survey on Safety Risks, Evaluations, and Improvements
Jiawen Deng
Jiale Cheng
Hao Sun
Zhexin Zhang
Minlie Huang
LM&MA
ELM
39
16
0
18 Feb 2023
Bounding the Capabilities of Large Language Models in Open Text
  Generation with Prompt Constraints
Bounding the Capabilities of Large Language Models in Open Text Generation with Prompt Constraints
Albert Lu
Hongxin Zhang
Yanzhe Zhang
Xuezhi Wang
Diyi Yang
LRM
35
28
0
17 Feb 2023
Pretraining Language Models with Human Preferences
Pretraining Language Models with Human Preferences
Tomasz Korbak
Kejian Shi
Angelica Chen
Rasika Bhalerao
C. L. Buckley
Jason Phang
Sam Bowman
Ethan Perez
ALM
SyDa
36
209
0
16 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
55
196
0
16 Feb 2023
Aligning Language Models with Preferences through f-divergence
  Minimization
Aligning Language Models with Preferences through f-divergence Minimization
Dongyoung Go
Tomasz Korbak
Germán Kruszewski
Jos Rozen
Nahyeon Ryu
Marc Dymetman
35
70
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
45
159
0
15 Feb 2023
Adding Instructions during Pretraining: Effective Way of Controlling
  Toxicity in Language Models
Adding Instructions during Pretraining: Effective Way of Controlling Toxicity in Language Models
Shrimai Prabhumoye
M. Patwary
Mohammad Shoeybi
Bryan Catanzaro
LM&MA
30
19
0
14 Feb 2023
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
BiasTestGPT: Using ChatGPT for Social Bias Testing of Language Models
Rafal Kocielnik
Shrimai Prabhumoye
Vivian Zhang
Roy Jiang
R. Alvarez
Anima Anandkumar
49
6
0
14 Feb 2023
AbLit: A Resource for Analyzing and Generating Abridged Versions of
  English Literature
AbLit: A Resource for Analyzing and Generating Abridged Versions of English Literature
Melissa Roemmele
Kyle Shaffer
Katrina Olsen
Yiyi Wang
Steve DeNeefe
21
1
0
13 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
  Security Attacks
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
32
238
0
11 Feb 2023
Data Selection for Language Models via Importance Resampling
Data Selection for Language Models via Importance Resampling
Sang Michael Xie
Shibani Santurkar
Tengyu Ma
Percy Liang
51
173
0
06 Feb 2023
Conversation Regression Testing: A Design Technique for Prototyping
  Generalizable Prompt Strategies for Pre-trained Language Models
Conversation Regression Testing: A Design Technique for Prototyping Generalizable Prompt Strategies for Pre-trained Language Models
J.D. Zamfirescu-Pereira
Bjoern Hartmann
Qian Yang
10
2
0
06 Feb 2023
Grounding Language Models to Images for Multimodal Inputs and Outputs
Grounding Language Models to Images for Multimodal Inputs and Outputs
Jing Yu Koh
Ruslan Salakhutdinov
Daniel Fried
MLLM
38
117
0
31 Jan 2023
Dynamic Scheduled Sampling with Imitation Loss for Neural Text
  Generation
Dynamic Scheduled Sampling with Imitation Loss for Neural Text Generation
Xiang Lin
Prathyusha Jwalapuram
Shafiq Joty
DiffM
31
0
0
31 Jan 2023
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and
  Toxicity
Red teaming ChatGPT via Jailbreaking: Bias, Robustness, Reliability and Toxicity
Terry Yue Zhuo
Yujin Huang
Chunyang Chen
Zhenchang Xing
SILM
41
103
0
30 Jan 2023
Theme-driven Keyphrase Extraction to Analyze Social Media Discourse
Theme-driven Keyphrase Extraction to Analyze Social Media Discourse
William Romano
Omar Sharif
Madhusudan Basak
Joseph Gatto
S. Preum
32
6
0
27 Jan 2023
Language Model Detoxification in Dialogue with Contextualized Stance
  Control
Language Model Detoxification in Dialogue with Contextualized Stance Control
Jingu Qian
Xifeng Yan
21
1
0
25 Jan 2023
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL
  Robustness
Dr.Spider: A Diagnostic Evaluation Benchmark towards Text-to-SQL Robustness
Shuaichen Chang
Jun Wang
Mingwen Dong
Lin Pan
Henghui Zhu
...
William Yang Wang
Zhiguo Wang
Vittorio Castelli
Patrick Ng
Bing Xiang
OOD
49
34
0
21 Jan 2023
Leveraging Large Language Models to Power Chatbots for Collecting User
  Self-Reported Data
Leveraging Large Language Models to Power Chatbots for Collecting User Self-Reported Data
Jing Wei
Sungdong Kim
Hyunhoon Jung
Young-Ho Kim
34
82
0
14 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
104
35
0
01 Jan 2023
MAUVE Scores for Generative Models: Theory and Practice
MAUVE Scores for Generative Models: Theory and Practice
Krishna Pillutla
Lang Liu
John Thickstun
Sean Welleck
Swabha Swayamdipta
Rowan Zellers
Sewoong Oh
Yejin Choi
Zaïd Harchaoui
EGVM
50
22
0
30 Dec 2022
OPT-IML: Scaling Language Model Instruction Meta Learning through the
  Lens of Generalization
OPT-IML: Scaling Language Model Instruction Meta Learning through the Lens of Generalization
Srinivasan Iyer
Xi Lin
Ramakanth Pasunuru
Todor Mihaylov
Daniel Simig
...
Jeff Wang
Christopher Dewan
Asli Celikyilmaz
Luke Zettlemoyer
Veselin Stoyanov
ALM
46
261
0
22 Dec 2022
Critic-Guided Decoding for Controlled Text Generation
Critic-Guided Decoding for Controlled Text Generation
Minbeom Kim
Hwanhee Lee
Kang Min Yoo
Joonsuk Park
Hwaran Lee
Kyomin Jung
41
35
0
21 Dec 2022
Detoxifying Text with MaRCo: Controllable Revision with Experts and
  Anti-Experts
Detoxifying Text with MaRCo: Controllable Revision with Experts and Anti-Experts
Skyler Hallinan
Alisa Liu
Yejin Choi
Maarten Sap
22
36
0
20 Dec 2022
Trustworthy Social Bias Measurement
Trustworthy Social Bias Measurement
Rishi Bommasani
Percy Liang
34
10
0
20 Dec 2022
Evaluating Psychological Safety of Large Language Models
Evaluating Psychological Safety of Large Language Models
Xingxuan Li
Yutong Li
Linlin Liu
Shafiq Joty
Lidong Bing
LM&MA
31
22
0
20 Dec 2022
Controllable Text Generation with Language Constraints
Controllable Text Generation with Language Constraints
Howard Chen
Huihan Li
Danqi Chen
Karthik Narasimhan
22
16
0
20 Dec 2022
Evaluating Human-Language Model Interaction
Evaluating Human-Language Model Interaction
Mina Lee
Megha Srivastava
Amelia Hardy
John Thickstun
Esin Durmus
...
Hancheng Cao
Tony Lee
Rishi Bommasani
Michael S. Bernstein
Percy Liang
LM&MA
ALM
63
100
0
19 Dec 2022
Discovering Language Model Behaviors with Model-Written Evaluations
Discovering Language Model Behaviors with Model-Written Evaluations
Ethan Perez
Sam Ringer
Kamilė Lukošiūtė
Karina Nguyen
Edwin Chen
...
Danny Hernandez
Deep Ganguli
Evan Hubinger
Nicholas Schiefer
Jared Kaplan
ALM
22
367
0
19 Dec 2022
I2D2: Inductive Knowledge Distillation with NeuroLogic and
  Self-Imitation
I2D2: Inductive Knowledge Distillation with NeuroLogic and Self-Imitation
Chandra Bhagavatula
Jena D. Hwang
Doug Downey
Ronan Le Bras
Ximing Lu
Lianhui Qin
Keisuke Sakaguchi
Swabha Swayamdipta
Peter West
Yejin Choi
28
34
0
19 Dec 2022
DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text
  Generation
DuNST: Dual Noisy Self Training for Semi-Supervised Controllable Text Generation
Yuxi Feng
Xiaoyuan Yi
Xiting Wang
L. Lakshmanan
Xing Xie
DiffM
35
5
0
16 Dec 2022
Teaching Small Language Models to Reason
Teaching Small Language Models to Reason
Lucie Charlotte Magister
Jonathan Mallinson
Jakub Adamek
Eric Malmi
Aliaksei Severyn
LRM
AI4CE
ReLM
44
248
0
16 Dec 2022
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in
  Zero-Shot Reasoning
On Second Thought, Let's Not Think Step by Step! Bias and Toxicity in Zero-Shot Reasoning
Omar Shaikh
Hongxin Zhang
William B. Held
Michael S. Bernstein
Diyi Yang
ReLM
LRM
40
186
0
15 Dec 2022
Editing Models with Task Arithmetic
Editing Models with Task Arithmetic
Gabriel Ilharco
Marco Tulio Ribeiro
Mitchell Wortsman
Suchin Gururangan
Ludwig Schmidt
Hannaneh Hajishirzi
Ali Farhadi
KELM
MoMe
MU
77
443
0
08 Dec 2022
Constructing Highly Inductive Contexts for Dialogue Safety through
  Controllable Reverse Generation
Constructing Highly Inductive Contexts for Dialogue Safety through Controllable Reverse Generation
Zhexin Zhang
Jiale Cheng
Hao Sun
Jiawen Deng
Fei Mi
Yasheng Wang
Lifeng Shang
Minlie Huang
SILM
37
8
0
04 Dec 2022
Improving Iterative Text Revision by Learning Where to Edit from Other
  Revision Tasks
Improving Iterative Text Revision by Learning Where to Edit from Other Revision Tasks
Zae Myung Kim
Wanyu Du
Vipul Raheja
Dhruv Kumar
Dongyeop Kang
17
16
0
02 Dec 2022
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog
  with Reinforced Keywords Learning
KRLS: Improving End-to-End Response Generation in Task Oriented Dialog with Reinforced Keywords Learning
Xiao Yu
Qingyang Wu
Kun Qian
Zhou Yu
OffRL
21
11
0
30 Nov 2022
Understanding BLOOM: An empirical study on diverse NLP tasks
Understanding BLOOM: An empirical study on diverse NLP tasks
Parag Dakle
Sai Krishna Rallabandi
Preethi Raghavan
AI4CE
39
3
0
27 Nov 2022
Previous
123...111213141516
Next