ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11462
  4. Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
ArXivPDFHTML

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 772 papers shown
Title
DESTEIN: Navigating Detoxification of Language Models via Universal
  Steering Pairs and Head-wise Activation Fusion
DESTEIN: Navigating Detoxification of Language Models via Universal Steering Pairs and Head-wise Activation Fusion
Yu Li
Zhihua Wei
Han Jiang
Chuanyang Gong
LLMSV
31
2
0
16 Apr 2024
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity,
  Bias and Propensity for Hallucinations
Benchmarking Llama2, Mistral, Gemma and GPT for Factuality, Toxicity, Bias and Propensity for Hallucinations
David Nadeau
Mike Kroutikov
Karen McNeil
Simon Baribeau
HILM
34
7
0
15 Apr 2024
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path
  Forward
Online Safety Analysis for LLMs: a Benchmark, an Assessment, and a Path Forward
Xuan Xie
Jiayang Song
Zhehua Zhou
Yuheng Huang
Da Song
Lei Ma
OffRL
55
6
0
12 Apr 2024
FairPair: A Robust Evaluation of Biases in Language Models through
  Paired Perturbations
FairPair: A Robust Evaluation of Biases in Language Models through Paired Perturbations
Jane Dwivedi-Yu
Raaz Dwivedi
Timo Schick
43
2
0
09 Apr 2024
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
SafetyPrompts: a Systematic Review of Open Datasets for Evaluating and Improving Large Language Model Safety
Paul Röttger
Fabio Pernisi
Bertie Vidgen
Dirk Hovy
ELM
KELM
66
32
0
08 Apr 2024
ALERT: A Comprehensive Benchmark for Assessing Large Language Models'
  Safety through Red Teaming
ALERT: A Comprehensive Benchmark for Assessing Large Language Models' Safety through Red Teaming
Simone Tedeschi
Felix Friedrich
P. Schramowski
Kristian Kersting
Roberto Navigli
Huu Nguyen
Bo Li
ELM
43
46
0
06 Apr 2024
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness
  of LLMs as Rankers
Do Large Language Models Rank Fairly? An Empirical Study on the Fairness of LLMs as Rankers
Yuan Wang
Xuyang Wu
Hsin-Tai Wu
Zhiqiang Tao
Yi Fang
ALM
39
7
0
04 Apr 2024
Digital Forgetting in Large Language Models: A Survey of Unlearning
  Methods
Digital Forgetting in Large Language Models: A Survey of Unlearning Methods
Alberto Blanco-Justicia
N. Jebreel
Benet Manzanares-Salor
David Sánchez
Josep Domingo-Ferrer
Guillem Collell
Kuan Eeik Tan
KELM
MU
62
18
0
02 Apr 2024
HyperCLOVA X Technical Report
HyperCLOVA X Technical Report
Kang Min Yoo
Jaegeun Han
Sookyo In
Heewon Jeon
Jisu Jeong
...
Hyunkyung Noh
Se-Eun Choi
Sang-Woo Lee
Jung Hwa Lim
Nako Sung
VLM
44
8
0
02 Apr 2024
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but
  Teaching the Distinction Helps
NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps
Kristina Gligorić
Myra Cheng
Lucia Zheng
Esin Durmus
Dan Jurafsky
45
9
0
02 Apr 2024
Source-Aware Training Enables Knowledge Attribution in Language Models
Source-Aware Training Enables Knowledge Attribution in Language Models
Muhammad Khalifa
David Wadden
Emma Strubell
Honglak Lee
Lu Wang
Iz Beltagy
Hao Peng
HILM
47
14
0
01 Apr 2024
Fairness in Large Language Models: A Taxonomic Survey
Fairness in Large Language Models: A Taxonomic Survey
Zhibo Chu
Zichong Wang
Wenbin Zhang
AILaw
48
33
0
31 Mar 2024
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM
  use in HCI research practices
"I'm categorizing LLM as a productivity tool": Examining ethics of LLM use in HCI research practices
Shivani Kapania
Ruiyi Wang
Toby Jia-Jun Li
Tianshi Li
Hong Shen
44
8
0
28 Mar 2024
A Review of Multi-Modal Large Language and Vision Models
A Review of Multi-Modal Large Language and Vision Models
Kilian Carolan
Laura Fennelly
Alan F. Smeaton
VLM
32
23
0
28 Mar 2024
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of
  Large Language Models
Don't Listen To Me: Understanding and Exploring Jailbreak Prompts of Large Language Models
Zhiyuan Yu
Xiaogeng Liu
Shunning Liang
Zach Cameron
Chaowei Xiao
Ning Zhang
35
43
0
26 Mar 2024
Risk and Response in Large Language Models: Evaluating Key Threat
  Categories
Risk and Response in Large Language Models: Evaluating Key Threat Categories
Bahareh Harandizadeh
A. Salinas
Fred Morstatter
30
3
0
22 Mar 2024
Ink and Individuality: Crafting a Personalised Narrative in the Age of
  LLMs
Ink and Individuality: Crafting a Personalised Narrative in the Age of LLMs
Azmine Toushik Wasi
Raima Islam
Rafia Islam
32
3
0
20 Mar 2024
From Representational Harms to Quality-of-Service Harms: A Case Study on
  Llama 2 Safety Safeguards
From Representational Harms to Quality-of-Service Harms: A Case Study on Llama 2 Safety Safeguards
Khaoula Chehbouni
Megha Roshan
Emmanuel Ma
Futian Andrew Wei
Afaf Taik
Jackie CK Cheung
G. Farnadi
34
7
0
20 Mar 2024
Securing Large Language Models: Threats, Vulnerabilities and Responsible
  Practices
Securing Large Language Models: Threats, Vulnerabilities and Responsible Practices
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
PILM
73
25
0
19 Mar 2024
Reinforcement Learning with Token-level Feedback for Controllable Text
  Generation
Reinforcement Learning with Token-level Feedback for Controllable Text Generation
Wendi Li
Wei Wei
Kaihe Xu
Wenfeng Xie
Dangyang Chen
Yu Cheng
41
7
0
18 Mar 2024
Word Order's Impacts: Insights from Reordering and Generation Analysis
Word Order's Impacts: Insights from Reordering and Generation Analysis
Qinghua Zhao
Jiaang Li
Lei Li
Zenghui Zhou
Junfeng Liu
38
0
0
18 Mar 2024
Detecting Bias in Large Language Models: Fine-tuned KcBERT
Detecting Bias in Large Language Models: Fine-tuned KcBERT
J. K. Lee
T. M. Chung
34
0
0
16 Mar 2024
Review of Generative AI Methods in Cybersecurity
Review of Generative AI Methods in Cybersecurity
Yagmur Yigit
William J. Buchanan
Madjid G Tehrani
Leandros A. Maglaras
AAML
59
20
0
13 Mar 2024
Bifurcated Attention: Accelerating Massively Parallel Decoding with
  Shared Prefixes in LLMs
Bifurcated Attention: Accelerating Massively Parallel Decoding with Shared Prefixes in LLMs
Ben Athiwaratkun
Sujan Kumar Gonugondla
Sanjay Krishna Gouda
Haifeng Qian
Hantian Ding
...
Liangfu Chen
Parminder Bhatia
Ramesh Nallapati
Sudipta Sengupta
Bing Xiang
59
4
0
13 Mar 2024
ORPO: Monolithic Preference Optimization without Reference Model
ORPO: Monolithic Preference Optimization without Reference Model
Jiwoo Hong
Noah Lee
James Thorne
OSLM
42
213
0
12 Mar 2024
Detectors for Safe and Reliable LLMs: Implementations, Uses, and
  Limitations
Detectors for Safe and Reliable LLMs: Implementations, Uses, and Limitations
Swapnaja Achintalwar
Adriana Alvarado Garcia
Ateret Anaby-Tavor
Ioana Baldini
Sara E. Berger
...
Aashka Trivedi
Kush R. Varshney
Dennis L. Wei
Shalisha Witherspooon
Marcel Zalmanovici
38
10
0
09 Mar 2024
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated
  Text
Decoding the AI Pen: Techniques and Challenges in Detecting AI-Generated Text
Sara Abdali
Richard Anarfi
C. Barberan
Jia He
DeLMO
34
10
0
09 Mar 2024
Aligners: Decoupling LLMs and Alignment
Aligners: Decoupling LLMs and Alignment
Lilian Ngweta
Mayank Agarwal
Subha Maity
Alex Gittens
Yuekai Sun
Mikhail Yurochkin
36
1
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches
  for Big Models
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
43
16
0
07 Mar 2024
From One to Many: Expanding the Scope of Toxicity Mitigation in Language
  Models
From One to Many: Expanding the Scope of Toxicity Mitigation in Language Models
Luiza Amador Pozzobon
Patrick Lewis
Sara Hooker
Beyza Ermis
50
7
0
06 Mar 2024
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
The WMDP Benchmark: Measuring and Reducing Malicious Use With Unlearning
Nathaniel Li
Alexander Pan
Anjali Gopal
Summer Yue
Daniel Berrios
...
Yan Shoshitaishvili
Jimmy Ba
K. Esvelt
Alexandr Wang
Dan Hendrycks
ELM
59
147
0
05 Mar 2024
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing
  Conversational LLMs with Direct RLHF
Balancing Enhancement, Harmlessness, and General Capabilities: Enhancing Conversational LLMs with Direct RLHF
Chen Zheng
Ke Sun
Hang Wu
Chenguang Xi
Xun Zhou
60
12
0
04 Mar 2024
LocalRQA: From Generating Data to Locally Training, Testing, and
  Deploying Retrieval-Augmented QA Systems
LocalRQA: From Generating Data to Locally Training, Testing, and Deploying Retrieval-Augmented QA Systems
Xiao Yu
Yunan Lu
Zhou Yu
RALM
39
6
0
01 Mar 2024
Authors' Values and Attitudes Towards AI-bridged Scalable
  Personalization of Creative Language Arts
Authors' Values and Attitudes Towards AI-bridged Scalable Personalization of Creative Language Arts
Taewook Kim
Hyomin Han
Eytan Adar
Matthew Kay
John Joon Young Chung
AI4CE
49
16
0
01 Mar 2024
Exploring Multilingual Concepts of Human Value in Large Language Models:
  Is Value Alignment Consistent, Transferable and Controllable across
  Languages?
Exploring Multilingual Concepts of Human Value in Large Language Models: Is Value Alignment Consistent, Transferable and Controllable across Languages?
Shaoyang Xu
Weilong Dong
Zishan Guo
Xinwei Wu
Deyi Xiong
54
6
0
28 Feb 2024
Making Them Ask and Answer: Jailbreaking Large Language Models in Few
  Queries via Disguise and Reconstruction
Making Them Ask and Answer: Jailbreaking Large Language Models in Few Queries via Disguise and Reconstruction
Tong Liu
Yingjie Zhang
Zhe Zhao
Yinpeng Dong
Guozhu Meng
Kai Chen
AAML
53
48
0
28 Feb 2024
TroubleLLM: Align to Red Team Expert
TroubleLLM: Align to Red Team Expert
Zhuoer Xu
Jianping Zhang
Shiwen Cui
Changhua Meng
Weiqiang Wang
52
1
0
28 Feb 2024
Exploring Advanced Methodologies in Security Evaluation for LLMs
Exploring Advanced Methodologies in Security Evaluation for LLMs
Junming Huang
Jiawei Zhang
Qi Wang
Weihong Han
Yanchun Zhang
48
0
0
28 Feb 2024
FairBelief -- Assessing Harmful Beliefs in Language Models
FairBelief -- Assessing Harmful Beliefs in Language Models
Mattia Setzu
Marta Marchiori Manerba
Pasquale Minervini
Debora Nozza
29
0
0
27 Feb 2024
Deconstructing the Veneer of Simplicity: Co-Designing Introductory
  Generative AI Workshops with Local Entrepreneurs
Deconstructing the Veneer of Simplicity: Co-Designing Introductory Generative AI Workshops with Local Entrepreneurs
Yasmine Kotturi
Angel Anderson
Glenn Ford
Michael Skirpan
Jeffrey P. Bigham
39
8
0
26 Feb 2024
A Comprehensive Evaluation of Quantization Strategies for Large Language
  Models
A Comprehensive Evaluation of Quantization Strategies for Large Language Models
Renren Jin
Jiangcun Du
Wuwei Huang
Wei Liu
Jian Luan
Bin Wang
Deyi Xiong
MQ
32
31
0
26 Feb 2024
Immunization against harmful fine-tuning attacks
Immunization against harmful fine-tuning attacks
Domenic Rosati
Jan Wehner
Kai Williams
Lukasz Bartoszcze
Jan Batzner
Hassan Sajjad
Frank Rudzicz
AAML
65
17
0
26 Feb 2024
Farsight: Fostering Responsible AI Awareness During AI Application
  Prototyping
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
45
44
0
23 Feb 2024
Fine-Grained Detoxification via Instance-Level Prefixes for Large
  Language Models
Fine-Grained Detoxification via Instance-Level Prefixes for Large Language Models
Xin Yi
Linlin Wang
Xiaoling Wang
Liang He
MoMe
44
1
0
23 Feb 2024
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural
  Language Generations
CEV-LM: Controlled Edit Vector Language Model for Shaping Natural Language Generations
Samraj Moorjani
A. Krishnan
Hari Sundaram
KELM
32
1
0
22 Feb 2024
Eagle: Ethical Dataset Given from Real Interactions
Eagle: Ethical Dataset Given from Real Interactions
Masahiro Kaneko
Danushka Bollegala
Timothy Baldwin
44
3
0
22 Feb 2024
Large Language Models are Vulnerable to Bait-and-Switch Attacks for
  Generating Harmful Content
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
Federico Bianchi
James Zou
32
4
0
21 Feb 2024
Emulated Disalignment: Safety Alignment for Large Language Models May
  Backfire!
Emulated Disalignment: Safety Alignment for Large Language Models May Backfire!
Zhanhui Zhou
Jie Liu
Zhichen Dong
Jiaheng Liu
Chao Yang
Wanli Ouyang
Yu Qiao
15
17
0
19 Feb 2024
Polarization of Autonomous Generative AI Agents Under Echo Chambers
Polarization of Autonomous Generative AI Agents Under Echo Chambers
Masaya Ohagi
LLMAG
30
7
0
19 Feb 2024
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
A Chinese Dataset for Evaluating the Safeguards in Large Language Models
Yuxia Wang
Zenan Zhai
Haonan Li
Xudong Han
Lizhi Lin
Zhenxuan Zhang
Jingru Zhao
Preslav Nakov
Timothy Baldwin
47
9
0
19 Feb 2024
Previous
123...567...141516
Next