ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11462
  4. Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
v1v2 (latest)

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
ArXiv (abs)PDFHTML

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 814 papers shown
Title
ALERT: A Comprehensive Benchmark for Assessing Large Language Models'
  Safety through Red Teaming
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Simone Tedeschi
Felix Friedrich
P. Schramowski
Kristian Kersting
Roberto Navigli
Huu Nguyen
Bo Li
ELM
116
52
0
06 Apr 2024
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness
  of LLMs as Rankers
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
Yuan Wang
Xuyang Wu
Hsin-Tai Wu
Zhiqiang Tao
Yi Fang
ALM
78
10
0
04 Apr 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning
  Methods
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia
N. Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo-Ferrer
Guillem Collell
Kuan Eeik Tan
KELMMU
109
22
0
02 Apr 2024
HyperCLOVA X Technical Report
HyperCLOVA X Technical Report
Kang Min Yoo
Jaegeun Han
Sookyo In
Heewon Jeon
Jisu Jeong
...
Hyunkyung Noh
Se-Eun Choi
Sang-Woo Lee
Jung Hwa Lim
Nako Sung
VLM
88
9
0
02 Apr 2024
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but
  Teaching the Distinction Helps
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Kristina Gligorić
Myra Cheng
Lucia Zheng
Esin Durmus
Dan Jurafsky
83
9
0
02 Apr 2024
Source-Aware Training Enables Knowledge Attribution in Language Models
Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa
David Wadden
Emma Strubell
Honglak Lee
Lu Wang
Iz Beltagy
Hao Peng
HILM
139
14
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
127
42
0
31 Mar 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM
  use in HCI research practices
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices
Shivani Kapania
Ruiyi Wang
Toby Jia-Jun Li
Tianshi Li
Hong Shen
91
11
0
28 Mar 2024
A Review of Multi-Modal Large Language and Vision Models
A Review of Multi-Modal Large Language and Vision Models
Kilian Carolan
Laura Fennelly
Alan F. Smeaton
VLM
186
28
0
28 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of
  Large Language Models
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
94
54
0
26 Mar 2024
Risk and Response in Large Language Models: Evaluating Key Threat
  Categories
Risk and Response in Large Language Models: Evaluating Key Threat Categories
Bahareh Harandizadeh
A. Salinas
Fred Morstatter
98
4
0
22 Mar 2024
Ink and Individuality: Crafting a Personalised Narrative in the Age of
  LLMs
Ink and Individuality: Crafting a Personalised Narrative in the Age of LLMs
Azmine Toushik Wasi
Raima Islam
Rafia Islam
81
4
0
20 Mar 2024
From Representational Harms to Quality-of-Service Harms: A Case Study on
  Llama 2 Safety Safeguards
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Khaoula Chehbouni
Megha Roshan
Emmanuel Ma
Futian Andrew Wei
Afaf Taik
Jackie CK Cheung
G. Farnadi
66
9
0
20 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
Erfan Shayegani
PILM
139
31
0
19 Mar 2024
Reinforcement Learning with Token-level Feedback for Controllable Text
  Generation
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Wendi Li
Xiaoye Qu
Kaihe Xu
Wenfeng Xie
Dangyang Chen
Yu Cheng
90
7
0
18 Mar 2024
Word Order's Impacts: Insights from Reordering and Generation Analysis
Word Order's Impacts: Insights from Reordering and Generation Analysis
Qinghua Zhao
Jiaang Li
Lei Li
Zenghui Zhou
Junfeng Liu
75
0
0
18 Mar 2024
Detecting Bias in Large Language Models: Fine-tuned KcBERT
Detecting Bias in Large Language Models: Fine-tuned KcBERT
J. K. Lee
T. M. Chung
75
0
0
16 Mar 2024
Review of Generative AI Methods in Cybersecurity
Review of Generative AI Methods in Cybersecurity
Yagmur Yigit
William J. Buchanan
Madjid G Tehrani
Leandros A. Maglaras
AAML
155
23
0
13 Mar 2024
Bifurcated Attention: Accelerating Massively Parallel Decoding with
  Shared Prefixes in LLMs
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Ben Athiwaratkun
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Haifeng Qian
Hantian Ding
...
Liangfu Chen
Parminder Bhatia
Ramesh Nallapati
Sudipta Sengupta
Bing Xiang
95
4
0
13 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
113
268
0
12 Mar 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and
  Limitations
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Swapnaja Achintalwar
Adriana Alvarado Garcia
Ateret Anaby-Tavor
Ioana Baldini
Sara E. Berger
...
Aashka Trivedi
Kush R. Varshney
Dennis L. Wei
Shalisha Witherspooon
Marcel Zalmanovici
94
11
0
09 Mar 2024
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated
  Text
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
DeLMO
106
12
0
09 Mar 2024
Aligners: Decoupling LLMs and Alignment
Aligners: Decoupling LLMs and Alignment
Lilian Ngweta
Mayank Agarwal
Subha Maity
Alex Gittens
Yuekai Sun
Mikhail Yurochkin
67
2
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches
  for Big Models
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
122
17
0
07 Mar 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language
  Models
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
Beyza Ermis
101
12
0
06 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
129
195
0
05 Mar 2024
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing
  Conversational LLMs with Direct RLHF
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
Chen Zheng
Ke Sun
Hang Wu
Chenguang Xi
Xun Zhou
107
12
0
04 Mar 2024
LocalRQA: From Generating Data to Locally Training, Testing, and
  Deploying Retrieval-Augmented QA Systems
LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems
Xiao Yu
Yunan Lu
Zhou Yu
RALM
73
7
0
01 Mar 2024
Authors' Values and Attitudes Towards AI-bridged Scalable
  Personalization of Creative Language Arts
Authors' Values and Attitudes Towards AI-bridged Scalable Personalization of Creative Language Arts
Taewook Kim
Hyomin Han
Eytan Adar
Matthew Kay
John Joon Young Chung
AI4CE
126
19
0
01 Mar 2024
Exploring Multilingual Concepts of Human Value in Large Language Models:
  Is Value Alignment Consistent, Transferable and Controllable across
  Languages?
Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?
Shaoyang Xu
Weilong Dong
Zishan Guo
Xinwei Wu
Deyi Xiong
97
8
0
28 Feb 2024
Making Them Ask and Answer: Jailbreaking Large Language Models in Few
  Queries via Disguise and Reconstruction
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Tong Liu
Yingjie Zhang
Zhe Zhao
Yinpeng Dong
Guozhu Meng
Kai Chen
AAML
111
60
0
28 Feb 2024
TroubleLLM: Align to Red Team Expert
TroubleLLM: Align to Red Team Expert
Zhuoer Xu
Jianping Zhang
Shiwen Cui
Changhua Meng
Weiqiang Wang
90
1
0
28 Feb 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
104
0
0
28 Feb 2024
FairBelief -- Assessing Harmful Beliefs in Language Models
FairBelief -- Assessing Harmful Beliefs in Language Models
Mattia Setzu
Marta Marchiori Manerba
Pasquale Minervini
Debora Nozza
65
0
0
27 Feb 2024
Deconstructing the Veneer of Simplicity: Co-Designing Introductory
  Generative AI Workshops with Local Entrepreneurs
Deconstructing the Veneer of Simplicity: Co-Designing Introductory Generative AI Workshops with Local Entrepreneurs
Yasmine Kotturi
Angel Anderson
Glenn Ford
Michael Skirpan
Jeffrey P. Bigham
85
8
0
26 Feb 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
109
37
0
26 Feb 2024
Immunization against harmful fine-tuning attacks
Immunization against harmful fine-tuning attacks
Domenic Rosati
Jan Wehner
Kai Williams
Lukasz Bartoszcze
Jan Batzner
Hassan Sajjad
Frank Rudzicz
AAML
109
22
0
26 Feb 2024
Farsight: Fostering Responsible AI Awareness During AI Application
  Prototyping
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
79
50
0
23 Feb 2024
Fine-Grained Detoxification via Instance-Level Prefixes for Large
  Language Models
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models
Xin Yi
Linlin Wang
Xiaoling Wang
Liang He
MoMe
90
1
0
23 Feb 2024
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural
  Language Generations
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations
Samraj Moorjani
A. Krishnan
Hari Sundaram
KELM
74
1
0
22 Feb 2024
Eagle: Ethical Dataset Given from Real Interactions
Eagle: Ethical Dataset Given from Real Interactions
Masahiro Kaneko
Danushka Bollegala
Timothy Baldwin
75
4
0
22 Feb 2024
Large Language Models are Vulnerable to Bait-and-Switch Attacks for
  Generating Harmful Content
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
Federico Bianchi
James Zou
74
5
0
21 Feb 2024
Emulated Disalignment: Safety Alignment for Large Language Models May
  Backfire!
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Zhanhui Zhou
Jie Liu
Zhichen Dong
Jiaheng Liu
Chao Yang
Wanli Ouyang
Yu Qiao
96
22
0
19 Feb 2024
Polarization of Autonomous Generative AI Agents Under Echo Chambers
Polarization of Autonomous Generative AI Agents Under Echo Chambers
Masaya Ohagi
LLMAG
65
8
0
19 Feb 2024
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Yuxia Wang
Zenan Zhai
Haonan Li
Xudong Han
Lizhi Lin
Zhenxuan Zhang
Jingru Zhao
Preslav Nakov
Timothy Baldwin
95
11
0
19 Feb 2024
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Zhen Xiang
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
152
109
0
19 Feb 2024
Multi-Task Inference: Can Large Language Models Follow Multiple
  Instructions at Once?
Multi-Task Inference: Can Large Language Models Follow Multiple Instructions at Once?
Seunghyeok Hong
Sangwon Baek
Sangdae Nam
Guijin Son
Seungone Kim
ELMLRM
119
17
0
18 Feb 2024
Controlled Text Generation for Large Language Model with Dynamic
  Attribute Graphs
Controlled Text Generation for Large Language Model with Dynamic Attribute Graphs
Xun Liang
Hanyu Wang
Shichao Song
Mengting Hu
Xunzhi Wang
Zhiyu Li
Feiyu Xiong
Simin Niu
67
11
0
17 Feb 2024
Disclosure and Mitigation of Gender Bias in LLMs
Disclosure and Mitigation of Gender Bias in LLMs
Xiangjue Dong
Yibo Wang
Philip S. Yu
James Caverlee
69
39
0
17 Feb 2024
Direct Preference Optimization with an Offset
Direct Preference Optimization with an Offset
Afra Amini
Tim Vieira
Ryan Cotterell
133
67
0
16 Feb 2024
Previous
123...678...151617
Next