Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.11462
Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"
50 / 772 papers shown
Title
FFT: Towards Harmlessness Evaluation and Analysis for LLMs with Factuality, Fairness, Toxicity
Shiyao Cui
Zhenyu Zhang
Yilong Chen
Wenyuan Zhang
Tianyun Liu
Siqi Wang
Tingwen Liu
41
15
0
30 Nov 2023
Fair Text-to-Image Diffusion via Fair Mapping
Jia Li
Lijie Hu
Jingfeng Zhang
Tianhang Zheng
Hua Zhang
Di Wang
56
14
0
29 Nov 2023
Unveiling the Implicit Toxicity in Large Language Models
Jiaxin Wen
Pei Ke
Hao Sun
Zhexin Zhang
Chengfei Li
Jinfeng Bai
Minlie Huang
42
26
0
29 Nov 2023
SoUnD Framework: Analyzing (So)cial Representation in (Un)structured (D)ata
Mark Díaz
Sunipa Dev
Emily Reif
Remi Denton
Vinodkumar Prabhakaran
38
3
0
28 Nov 2023
DUnE: Dataset for Unified Editing
Afra Feyza Akyürek
Eric Pan
Garry Kuwanto
Derry Wijaya
KELM
32
17
0
27 Nov 2023
Challenges of Large Language Models for Mental Health Counseling
N. C. Chung
George C. Dyer
L. Brocki
LM&MA
AI4MH
73
14
0
23 Nov 2023
Compositional Capabilities of Autoregressive Transformers: A Study on Synthetic, Interpretable Tasks
Rahul Ramesh
Ekdeep Singh Lubana
Mikail Khona
Robert P. Dick
Hidenori Tanaka
CoGe
39
8
0
21 Nov 2023
Causal ATE Mitigates Unintended Bias in Controlled Text Generation
Rahul Madhavan
Kahini Wadhawan
45
0
0
19 Nov 2023
Unmasking and Improving Data Credibility: A Study with Datasets for Training Harmless Language Models
Zhaowei Zhu
Jialu Wang
Hao Cheng
Yang Liu
31
16
0
19 Nov 2023
Cognitive Overload: Jailbreaking Large Language Models with Overloaded Logical Thinking
Nan Xu
Fei Wang
Ben Zhou
Bangzheng Li
Chaowei Xiao
Muhao Chen
34
56
0
16 Nov 2023
JAB: Joint Adversarial Prompting and Belief Augmentation
Ninareh Mehrabi
Palash Goyal
Anil Ramakrishna
Jwala Dhamala
Shalini Ghosh
Richard Zemel
Kai-Wei Chang
Aram Galstyan
Rahul Gupta
AAML
41
7
0
16 Nov 2023
How Trustworthy are Open-Source LLMs? An Assessment under Malicious Demonstrations Shows their Vulnerabilities
Lingbo Mo
Boshi Wang
Muhao Chen
Huan Sun
34
27
0
15 Nov 2023
Towards Publicly Accountable Frontier LLMs: Building an External Scrutiny Ecosystem under the ASPIRE Framework
Markus Anderljung
Everett Thornton Smith
Joe O'Brien
Lisa Soder
Ben Bucknall
Emma Bluemke
Jonas Schuett
Robert F. Trager
Lacey Strahm
Rumman Chowdhury
43
17
0
15 Nov 2023
AART: AI-Assisted Red-Teaming with Diverse Data Generation for New LLM-powered Applications
Bhaktipriya Radharapu
Kevin Robinson
Lora Aroyo
Preethi Lahoti
36
37
0
14 Nov 2023
SimpleSafetyTests: a Test Suite for Identifying Critical Safety Risks in Large Language Models
Bertie Vidgen
Nino Scherrer
Hannah Rose Kirk
Rebecca Qian
Anand Kannappan
Scott A. Hale
Paul Röttger
ALM
ELM
35
27
0
14 Nov 2023
Fair Abstractive Summarization of Diverse Perspectives
Yusen Zhang
Nan Zhang
Yixin Liu
Alexander R. Fabbri
Junru Liu
...
Caiming Xiong
Jieyu Zhao
Dragomir R. Radev
Kathleen McKeown
Rui Zhang
38
8
0
14 Nov 2023
Understanding Users' Dissatisfaction with ChatGPT Responses: Types, Resolving Tactics, and the Effect of Knowledge Level
Yoonsu Kim
Jueon Lee
Seoyoung Kim
Jaehyuk Park
Juho Kim
44
37
0
13 Nov 2023
Controlled Text Generation for Black-box Language Models via Score-based Progressive Editor
Sangwon Yu
Changmin Lee
Hojin Lee
Sungroh Yoon
34
0
0
13 Nov 2023
Flames: Benchmarking Value Alignment of LLMs in Chinese
Kexin Huang
Xiangyang Liu
Qianyu Guo
Tianxiang Sun
Jiawei Sun
...
Yixu Wang
Yan Teng
Xipeng Qiu
Yingchun Wang
Dahua Lin
ALM
35
10
0
12 Nov 2023
In-context Vectors: Making In Context Learning More Effective and Controllable Through Latent Space Steering
Sheng Liu
Haotian Ye
Lei Xing
James Y. Zou
26
87
0
11 Nov 2023
Leveraging LLMs for Synthesizing Training Data Across Many Languages in Multilingual Dense Retrieval
Nandan Thakur
Jianmo Ni
Gustavo Hernández Ábrego
John Wieting
Jimmy J. Lin
Daniel Cer
RALM
49
12
0
10 Nov 2023
Identifying and Mitigating Vulnerabilities in LLM-Integrated Applications
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Wei Ping
Jinyuan Jia
Bo Li
Radha Poovendran
AAML
21
20
0
07 Nov 2023
Unveiling Safety Vulnerabilities of Large Language Models
George Kour
Marcel Zalmanovici
Naama Zwerdling
Esther Goldbraich
Ora Nova Fandina
Ateret Anaby-Tavor
Orna Raz
E. Farchi
AAML
35
15
0
07 Nov 2023
FinGPT: Large Generative Models for a Small Language
Risto Luukkonen
Ville Komulainen
Jouni Luoma
Anni Eskelinen
Jenna Kanerva
...
Mikko Merioksa
Jyrki Heinonen
Aija Vahtola
Samuel Antao
S. Pyysalo
LM&MA
25
42
0
03 Nov 2023
Successor Features for Efficient Multisubject Controlled Text Generation
Mengyao Cao
Mehdi Fatemi
Jackie Chi Kit Cheung
Samira Shabanian
BDL
37
0
0
03 Nov 2023
Style Locality for Controllable Generation with kNN Language Models
Gilles Nawezi
Lucie Flek
Charles F Welch
RALM
24
0
0
01 Nov 2023
Transformation vs Tradition: Artificial General Intelligence (AGI) for Arts and Humanities
Zheng Liu
Yiwei Li
Qian Cao
Junwen Chen
Tianze Yang
...
John Gibbs
Khaled Rasheed
Ninghao Liu
Gengchen Mai
Tianming Liu
AI4CE
43
10
0
30 Oct 2023
Reward Finetuning for Faster and More Accurate Unsupervised Object Discovery
Katie Z Luo
Zhenzhen Liu
Xiangyu Chen
Yurong You
Sagie Benaim
Cheng Perng Phoo
Mark E. Campbell
Wen Sun
B. Hariharan
Kilian Q. Weinberger
OffRL
32
11
0
29 Oct 2023
N-Critics: Self-Refinement of Large Language Models with Ensemble of Critics
Sajad Mousavi
Ricardo Luna Gutierrez
Desik Rengarajan
Vineet Gundecha
Ashwin Ramesh Babu
Avisek Naug
Antonio Guillen-Perez
Soumyendu Sarkar
LRM
HILM
KELM
29
6
0
28 Oct 2023
NLP Evaluation in trouble: On the Need to Measure LLM Data Contamination for each Benchmark
Oscar Sainz
Jon Ander Campos
Iker García-Ferrero
Julen Etxaniz
Oier López de Lacalle
Eneko Agirre
27
156
0
27 Oct 2023
Unpacking the Ethical Value Alignment in Big Models
Xiaoyuan Yi
Jing Yao
Xiting Wang
Xing Xie
24
11
0
26 Oct 2023
ToxicChat: Unveiling Hidden Challenges of Toxicity Detection in Real-World User-AI Conversation
Zi Lin
Zihan Wang
Yongqi Tong
Yangkun Wang
Yuxin Guo
Yujia Wang
Jingbo Shang
AI4MH
18
93
0
26 Oct 2023
Improving Diversity of Demographic Representation in Large Language Models via Collective-Critiques and Self-Voting
Preethi Lahoti
Nicholas Blumm
Xiao Ma
Raghavendra Kotikalapudi
Sahitya Potluri
...
Hansa Srinivasan
Ben Packer
Ahmad Beirami
Alex Beutel
Jilin Chen
52
29
0
25 Oct 2023
Ignore This Title and HackAPrompt: Exposing Systemic Vulnerabilities of LLMs through a Global Scale Prompt Hacking Competition
Sander Schulhoff
Jeremy Pinto
Anaum Khan
Louis-Franccois Bouchard
Chenglei Si
Svetlina Anati
Valen Tagliabue
Anson Liu Kost
Christopher Carnahan
Jordan L. Boyd-Graber
SILM
39
41
0
24 Oct 2023
Do Stochastic Parrots have Feelings Too? Improving Neural Detection of Synthetic Text via Emotion Recognition
Alan Cowap
Yvette Graham
Jennifer Foster
DeLMO
38
0
0
24 Oct 2023
Self-Guard: Empower the LLM to Safeguard Itself
Zezhong Wang
Fangkai Yang
Lu Wang
Pu Zhao
Hongru Wang
Liang Chen
Qingwei Lin
Kam-Fai Wong
83
29
0
24 Oct 2023
Air-Decoding: Attribute Distribution Reconstruction for Decoding-Time Controllable Text Generation
Tianqi Zhong
Quan Wang
Jingxuan Han
Yongdong Zhang
Zhendong Mao
35
9
0
23 Oct 2023
MoPe: Model Perturbation-based Privacy Attacks on Language Models
Marvin Li
Jason Wang
Jeffrey G. Wang
Seth Neel
AAML
35
18
0
22 Oct 2023
LUNA: A Model-Based Universal Analysis Framework for Large Language Models
Da Song
Xuan Xie
Jiayang Song
Derui Zhu
Yuheng Huang
Felix Juefei Xu
Lei Ma
ALM
40
3
0
22 Oct 2023
Values, Ethics, Morals? On the Use of Moral Concepts in NLP Research
Karina Vida
Judith Simon
Anne Lauscher
18
17
0
21 Oct 2023
Teaching Language Models to Self-Improve through Interactive Demonstrations
Xiao Yu
Baolin Peng
Michel Galley
Jianfeng Gao
Zhou Yu
LRM
ReLM
43
20
0
20 Oct 2023
Steering Large Language Models for Machine Translation with Finetuning and In-Context Learning
Duarte M. Alves
Nuno M. Guerreiro
Joao Alves
José P. Pombal
Ricardo Rei
José G. C. de Souza
Pierre Colombo
André F.T. Martins
47
49
0
20 Oct 2023
An LLM can Fool Itself: A Prompt-Based Adversarial Attack
Xilie Xu
Keyi Kong
Ning Liu
Li-zhen Cui
Di Wang
Jingfeng Zhang
Mohan Kankanhalli
AAML
SILM
38
69
0
20 Oct 2023
Model Merging by Uncertainty-Based Gradient Matching
Nico Daheim
Thomas Möllenhoff
Edoardo Ponti
Iryna Gurevych
Mohammad Emtiyaz Khan
MoMe
FedML
37
45
0
19 Oct 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
43
303
0
19 Oct 2023
Privacy Preserving Large Language Models: ChatGPT Case Study Based Vision and Framework
Imdad Ullah
Najm Hassan
S. Gill
Basem Suleiman
T. Ahanger
Zawar Shah
Junaid Qadir
S. Kanhere
45
16
0
19 Oct 2023
Attack Prompt Generation for Red Teaming and Defending Large Language Models
Boyi Deng
Wenjie Wang
Fuli Feng
Yang Deng
Qifan Wang
Xiangnan He
AAML
25
50
0
19 Oct 2023
Denevil: Towards Deciphering and Navigating the Ethical Values of Large Language Models via Instruction Learning
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
35
9
0
17 Oct 2023
Survey of Vulnerabilities in Large Language Models Revealed by Adversarial Attacks
Erfan Shayegani
Md Abdullah Al Mamun
Yu Fu
Pedram Zaree
Yue Dong
Nael B. Abu-Ghazaleh
AAML
149
150
0
16 Oct 2023
Privacy in Large Language Models: Attacks, Defenses and Future Directions
Haoran Li
Yulin Chen
Jinglong Luo
Yan Kang
Xiaojin Zhang
Qi Hu
Chunkit Chan
Yangqiu Song
PILM
55
42
0
16 Oct 2023
Previous
1
2
3
...
7
8
9
...
14
15
16
Next