ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2009.11462
  4. Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language
  Models
v1v2 (latest)

RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models

24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
ArXiv (abs)PDFHTML

Papers citing "RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"

50 / 814 papers shown
Title
Attribute Controlled Fine-tuning for Large Language Models: A Case Study
  on Detoxification
Attribute Controlled Fine-tuning for Large Language Models: A Case Study on Detoxification
Tao Meng
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Aram Galstyan
Richard Zemel
Kai-Wei Chang
Rahul Gupta
Charith Peris
28
1
0
07 Oct 2024
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized
  Distributions
OD-Stega: LLM-Based Near-Imperceptible Steganography via Optimized Distributions
Yu-Shin Huang
Peter Just
Krishna Narayanan
Chao Tian
132
7
0
06 Oct 2024
Large Language Models can be Strong Self-Detoxifiers
Large Language Models can be Strong Self-Detoxifiers
Ching-Yun Ko
Pin-Yu Chen
Payel Das
Youssef Mroueh
Soham Dan
Georgios Kollias
Subhajit Chaudhury
Tejaswini Pedapati
Luca Daniel
73
3
0
04 Oct 2024
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large
  Language Models in Scientific Tasks
SciSafeEval: A Comprehensive Benchmark for Safety Alignment of Large Language Models in Scientific Tasks
Tianhao Li
Jingyu Lu
Chuangxin Chu
Tianyu Zeng
Yujia Zheng
...
Xuejing Yuan
Xingkai Wang
Keyan Ding
Huajun Chen
Qiang Zhang
ELM
104
5
0
02 Oct 2024
Stars, Stripes, and Silicon: Unravelling the ChatGPT's All-American,
  Monochrome, Cis-centric Bias
Stars, Stripes, and Silicon: Unravelling the ChatGPT's All-American, Monochrome, Cis-centric Bias
Federico Torrielli
103
1
0
02 Oct 2024
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Decoding Hate: Exploring Language Models' Reactions to Hate Speech
Paloma Piot
Javier Parapar
126
2
0
01 Oct 2024
Ruler: A Model-Agnostic Method to Control Generated Length for Large
  Language Models
Ruler: A Model-Agnostic Method to Control Generated Length for Large Language Models
Jiaming Li
Lei Zhang
Yunshui Li
Ziqiang Liu
Yuelin Bai
Run Luo
Longze Chen
Min Yang
ALM
46
0
0
27 Sep 2024
BeanCounter: A low-toxicity, large-scale, and open dataset of
  business-oriented text
BeanCounter: A low-toxicity, large-scale, and open dataset of business-oriented text
Siyan Wang
Bradford Levy
66
2
0
26 Sep 2024
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard
  for Prompt Attacks
MoJE: Mixture of Jailbreak Experts, Naive Tabular Classifiers as Guard for Prompt Attacks
Giandomenico Cornacchia
Giulio Zizzo
Kieran Fraser
Muhammad Zaid Hameed
Ambrish Rawat
Mark Purcell
75
3
0
26 Sep 2024
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
Yifan Jiang
Kriti Aggarwal
Tanmay Laud
Kashif Munir
Jay Pujara
Subhabrata Mukherjee
AAML
116
0
0
26 Sep 2024
Decoding Large-Language Models: A Systematic Overview of Socio-Technical
  Impacts, Constraints, and Emerging Questions
Decoding Large-Language Models: A Systematic Overview of Socio-Technical Impacts, Constraints, and Emerging Questions
Zeyneb N. Kaya
Souvick Ghosh
55
0
0
25 Sep 2024
XTRUST: On the Multilingual Trustworthiness of Large Language Models
XTRUST: On the Multilingual Trustworthiness of Large Language Models
Yahan Li
Yi Wang
Yi-Ju Chang
Yuan Wu
LRMHILM
43
0
0
24 Sep 2024
Creative Writers' Attitudes on Writing as Training Data for Large
  Language Models
Creative Writers' Attitudes on Writing as Training Data for Large Language Models
Katy Ilonka Gero
Meera Desai
Carly Schnitzler
Nayun Eom
Jack Cushman
Elena L. Glassman
107
1
0
22 Sep 2024
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions
STOP! Benchmarking Large Language Models with Sensitivity Testing on Offensive Progressions
Robert D Morabito
Sangmitra Madhusudan
Tyler McDonald
Ali Emami
60
2
0
20 Sep 2024
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Edu-Values: Towards Evaluating the Chinese Education Values of Large Language Models
Peiyi Zhang
Yazhou Zhang
Bo Wang
Lu Rong
Jing Qin
Jing Qin
AI4EdELM
145
2
0
19 Sep 2024
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs
  Fine-tuning
Multitask Mayhem: Unveiling and Mitigating Safety Gaps in LLMs Fine-tuning
Essa Jan
Nouar Aldahoul
Moiz Ali
Faizan Ahmad
Fareed Zaffar
Yasir Zaki
57
3
0
18 Sep 2024
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless
  Integration of Multi Active/Passive Core-Agents
LLM-Agent-UMF: LLM-based Agent Unified Modeling Framework for Seamless Integration of Multi Active/Passive Core-Agents
Amine B. Hassouna
Hana Chaari
Ines Belhaj
LLMAG
99
1
0
17 Sep 2024
Jailbreaking Large Language Models with Symbolic Mathematics
Jailbreaking Large Language Models with Symbolic Mathematics
Emet Bethany
Mazal Bethany
Juan Arturo Nolazco Flores
S. Jha
Peyman Najafirad
AAML
52
6
0
17 Sep 2024
Alignment with Preference Optimization Is All You Need for LLM Safety
Alignment with Preference Optimization Is All You Need for LLM Safety
Réda Alami
Ali Khalifa Almansoori
Ahmed Alzubaidi
M. Seddik
Mugariya Farooq
Hakim Hacid
73
1
0
12 Sep 2024
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Retro-li: Small-Scale Retrieval Augmented Generation Supporting Noisy Similarity Searches and Domain Shift Generalization
Gentiana Rashiti
G. Karunaratne
Mrinmaya Sachan
Abu Sebastian
Abbas Rahimi
RALM
237
0
0
12 Sep 2024
Securing Vision-Language Models with a Robust Encoder Against Jailbreak
  and Adversarial Attacks
Securing Vision-Language Models with a Robust Encoder Against Jailbreak and Adversarial Attacks
Md Zarif Hossain
Ahmed Imteaj
AAMLVLM
81
6
0
11 Sep 2024
Identity-related Speech Suppression in Generative AI Content Moderation
Identity-related Speech Suppression in Generative AI Content Moderation
Oghenefejiro Isaacs Anigboro
Charlie M. Crawford
Danaë Metaxa
Sorelle A. Friedler
Sorelle A. Friedler
145
0
0
09 Sep 2024
The Dark Side of Human Feedback: Poisoning Large Language Models via
  User Inputs
The Dark Side of Human Feedback: Poisoning Large Language Models via User Inputs
Bocheng Chen
Hanqing Guo
Guangjing Wang
Yuanda Wang
Qiben Yan
AAML
106
5
0
01 Sep 2024
MQM-Chat: Multidimensional Quality Metrics for Chat Translation
MQM-Chat: Multidimensional Quality Metrics for Chat Translation
Yunmeng Li
Jun Suzuki
Makoto Morishita
Kaori Abe
Kentaro Inui
114
3
0
29 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
202
15
0
27 Aug 2024
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question
  Answering
IQA-EVAL: Automatic Evaluation of Human-Model Interactive Question Answering
Ruosen Li
Barry Wang
Ruochen Li
Xinya Du
ELM
86
6
0
24 Aug 2024
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba-1.5: Hybrid Transformer-Mamba Models at Scale
Jamba Team
Barak Lenz
Alan Arazi
Amir Bergman
Avshalom Manevich
...
Yehoshua Cohen
Yonatan Belinkov
Y. Globerson
Yuval Peleg Levy
Y. Shoham
114
33
0
22 Aug 2024
Can Artificial Intelligence Embody Moral Values?
Can Artificial Intelligence Embody Moral Values?
T. Swoboda
Lode Lauwaert
51
1
0
22 Aug 2024
Efficient Detection of Toxic Prompts in Large Language Models
Efficient Detection of Toxic Prompts in Large Language Models
Yi Liu
Junzhe Yu
Huijia Sun
Ling Shi
Gelei Deng
Yuqi Chen
Yang Liu
100
6
0
21 Aug 2024
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion
  for Efficient Inference Intervention in Large Language Model
Predicting Rewards Alongside Tokens: Non-disruptive Parameter Insertion for Efficient Inference Intervention in Large Language Model
Chenhan Yuan
Fei Huang
Ru Peng
Keming Lu
Bowen Yu
Chang Zhou
Jingren Zhou
KELM
80
0
0
20 Aug 2024
LeCov: Multi-level Testing Criteria for Large Language Models
LeCov: Multi-level Testing Criteria for Large Language Models
Xuan Xie
Jiayang Song
Yuheng Huang
Da Song
Fuyuan Zhang
Felix Juefei-Xu
Lei Ma
ELM
94
0
0
20 Aug 2024
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Does Liking Yellow Imply Driving a School Bus? Semantic Leakage in Language Models
Hila Gonen
Terra Blevins
Alisa Liu
Luke Zettlemoyer
Noah A. Smith
142
5
0
12 Aug 2024
Diffusion Guided Language Modeling
Diffusion Guided Language Modeling
Justin Lovelace
Varsha Kishore
Yiwei Chen
Kilian Q. Weinberger
124
8
0
08 Aug 2024
Decoding Biases: Automated Methods and LLM Judges for Gender Bias
  Detection in Language Models
Decoding Biases: Automated Methods and LLM Judges for Gender Bias Detection in Language Models
Shachi H. Kumar
Saurav Sahay
Sahisnu Mazumder
Eda Okur
R. Manuvinakurike
Nicole Beckage
Hsuan Su
Hung-yi Lee
L. Nachman
ELM
99
18
0
07 Aug 2024
Downstream bias mitigation is all you need
Downstream bias mitigation is all you need
Arkadeep Baksi
Rahul Singh
Tarun Joshi
AI4CE
51
0
0
01 Aug 2024
Blockchain for Large Language Model Security and Safety: A Holistic
  Survey
Blockchain for Large Language Model Security and Safety: A Holistic Survey
Caleb Geren
Amanda Board
Gaby G. Dagher
Tim Andersen
Jun Zhuang
106
6
0
26 Jul 2024
Understanding the Interplay of Scale, Data, and Bias in Language Models:
  A Case Study with BERT
Understanding the Interplay of Scale, Data, and Bias in Language Models: A Case Study with BERT
Muhammad Ali
Swetasudha Panda
Qinlan Shen
Michael Wick
Ari Kobren
MILM
99
3
0
25 Jul 2024
GermanPartiesQA: Benchmarking Commercial Large Language Models for
  Political Bias and Sycophancy
GermanPartiesQA: Benchmarking Commercial Large Language Models for Political Bias and Sycophancy
Jan Batzner
Volker Stocker
Stefan Schmid
Gjergji Kasneci
74
3
0
25 Jul 2024
Know Your Limits: A Survey of Abstention in Large Language Models
Know Your Limits: A Survey of Abstention in Large Language Models
Bingbing Wen
Jihan Yao
Shangbin Feng
Chenjun Xu
Yulia Tsvetkov
Bill Howe
Lucy Lu Wang
129
5
0
25 Jul 2024
Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias
  Mitigation
Towards Transfer Unlearning: Empirical Evidence of Cross-Domain Bias Mitigation
Huimin Lu
Masaru Isonuma
Junichiro Mori
Ichiro Sakata
MU
69
1
0
24 Jul 2024
Towards Aligning Language Models with Textual Feedback
Towards Aligning Language Models with Textual Feedback
Sauc Abadal Lloret
Shehzaad Dhuliawala
K. Murugesan
Mrinmaya Sachan
VLM
117
1
0
24 Jul 2024
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal
  Mechanisms and the Superficial Hypothesis
Intrinsic Self-correction for Enhanced Morality: An Analysis of Internal Mechanisms and the Superficial Hypothesis
Guang-Da Liu
Haitao Mao
Jiliang Tang
K. Johnson
LRM
97
8
0
21 Jul 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
127
43
0
20 Jul 2024
BiasDPO: Mitigating Bias in Language Models through Direct Preference
  Optimization
BiasDPO: Mitigating Bias in Language Models through Direct Preference Optimization
Ahmed Allam
94
10
0
18 Jul 2024
SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning
SpeciaLex: A Benchmark for In-Context Specialized Lexicon Learning
Joseph Marvin Imperial
Harish Tayyar Madabushi
68
1
0
18 Jul 2024
Vectoring Languages
Vectoring Languages
Joseph Chen
58
0
0
16 Jul 2024
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine
  Studies
How Are LLMs Mitigating Stereotyping Harms? Learning from Search Engine Studies
Alina Leidinger
Richard Rogers
117
8
0
16 Jul 2024
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
The Oscars of AI Theater: A Survey on Role-Playing with Language Models
Nuo Chen
Yan Wang
Yang Deng
Jia Li
124
21
0
16 Jul 2024
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated
  Responses
CLAVE: An Adaptive Framework for Evaluating Values of LLM Generated Responses
Jing Yao
Xiaoyuan Yi
Xing Xie
ELMALM
92
11
0
15 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
128
7
0
11 Jul 2024
Previous
12345...151617
Next