Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.11462
Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"
50 / 772 papers shown
Title
On the Trustworthiness Landscape of State-of-the-art Generative Models: A Survey and Outlook
Mingyuan Fan
Chengyu Wang
Cen Chen
Yang Liu
Jun Huang
HILM
41
3
0
31 Jul 2023
Trie-NLG: Trie Context Augmentation to Improve Personalized Query Auto-Completion for Short and Unseen Prefixes
Kaushal Kumar Maurya
M. Desarkar
Manish Gupta
Puneet Agrawal
21
2
0
28 Jul 2023
Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback
Stephen Casper
Xander Davies
Claudia Shi
T. Gilbert
Jérémy Scheurer
...
Erdem Biyik
Anca Dragan
David M. Krueger
Dorsa Sadigh
Dylan Hadfield-Menell
ALM
OffRL
52
481
0
27 Jul 2023
Trustworthiness of Children Stories Generated by Large Language Models
Prabin Bhandari
H. M. Brennan
38
2
0
25 Jul 2023
Towards Automatic Boundary Detection for Human-AI Collaborative Hybrid Essay in Education
Zijie Zeng
Lele Sha
Yuheng Li
Kaixun Yang
D. Gašević
Guanliang Chen
DeLMO
35
13
0
23 Jul 2023
Of Models and Tin Men: A Behavioural Economics Study of Principal-Agent Problems in AI Alignment using Large-Language Models
S. Phelps
Rebecca E. Ranson
LLMAG
34
1
0
20 Jul 2023
FLASK: Fine-grained Language Model Evaluation based on Alignment Skill Sets
Seonghyeon Ye
Doyoung Kim
Sungdong Kim
Hyeonbin Hwang
Seungone Kim
Yongrae Jo
James Thorne
Juho Kim
Minjoon Seo
ALM
48
101
0
20 Jul 2023
Gender-tuning: Empowering Fine-tuning for Debiasing Pre-trained Language Models
Somayeh Ghanbarzadeh
Yan-ping Huang
Hamid Palangi
R. C. Moreno
Hamed Khanpour
47
12
0
20 Jul 2023
How is ChatGPT's behavior changing over time?
Lingjiao Chen
Matei A. Zaharia
James Zou
ELM
KELM
AI4MH
49
415
0
18 Jul 2023
MasterKey: Automated Jailbreak Across Multiple Large Language Model Chatbots
Gelei Deng
Yi Liu
Yuekang Li
Kailong Wang
Ying Zhang
Zefeng Li
Haoyu Wang
Tianwei Zhang
Yang Liu
SILM
42
118
0
16 Jul 2023
Understanding Multi-Turn Toxic Behaviors in Open-Domain Chatbots
Bocheng Chen
Guangjing Wang
Hanqing Guo
Yuanda Wang
Qiben Yan
46
15
0
14 Jul 2023
Effective Prompt Extraction from Language Models
Yiming Zhang
Nicholas Carlini
Daphne Ippolito
MIACV
SILM
38
37
0
13 Jul 2023
A Comprehensive Overview of Large Language Models
Humza Naveed
Asad Ullah Khan
Shi Qiu
Muhammad Saqib
Saeed Anwar
Muhammad Usman
Naveed Akhtar
Nick Barnes
Ajmal Mian
OffRL
72
544
0
12 Jul 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
32
413
0
10 Jul 2023
Frontier AI Regulation: Managing Emerging Risks to Public Safety
Markus Anderljung
Joslyn Barnhart
Anton Korinek
Jade Leung
Cullen O'Keefe
...
Jonas Schuett
Yonadav Shavit
Divya Siddarth
Robert F. Trager
Kevin J. Wolf
SILM
51
118
0
06 Jul 2023
PREADD: Prefix-Adaptive Decoding for Controlled Text Generation
Jonathan Pei
Kevin Kaichuang Yang
Dan Klein
49
21
0
06 Jul 2023
A Survey on Evaluation of Large Language Models
Yu-Chu Chang
Xu Wang
Jindong Wang
Yuanyi Wu
Linyi Yang
...
Yue Zhang
Yi-Ju Chang
Philip S. Yu
Qian Yang
Xingxu Xie
ELM
LM&MA
ALM
85
1,538
0
06 Jul 2023
Jailbroken: How Does LLM Safety Training Fail?
Alexander Wei
Nika Haghtalab
Jacob Steinhardt
127
856
0
05 Jul 2023
Understanding Counterspeech for Online Harm Mitigation
Yi-Ling Chung
Gavin Abercrombie
Florence E. Enock
Jonathan Bright
Verena Rieser
25
16
0
01 Jul 2023
Queer People are People First: Deconstructing Sexual Identity Stereotypes in Large Language Models
Harnoor Dhingra
Preetiha Jayashanker
Sayali S. Moghe
Emma Strubell
40
13
0
30 Jun 2023
Transformers in Healthcare: A Survey
Subhash Nerella
S. Bandyopadhyay
Jiaqing Zhang
Miguel Contreras
Scott Siegel
...
Jessica Sena
B. Shickel
A. Bihorac
Kia Khezeli
Parisa Rashidi
MedIm
AI4CE
30
25
0
30 Jun 2023
Stay on topic with Classifier-Free Guidance
Guillaume Sanchez
Honglu Fan
Alexander Spangher
Elad Levi
Pawan Sasanka Ammanamanchi
Stella Biderman
3DV
38
48
0
30 Jun 2023
ZeroGen: Zero-shot Multimodal Controllable Text Generation with Multiple Oracles
Haoqin Tu
Bowen Yang
Xianfeng Zhao
32
6
0
29 Jun 2023
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
46
213
0
28 Jun 2023
CBBQ: A Chinese Bias Benchmark Dataset Curated with Human-AI Collaboration for Large Language Models
Yufei Huang
Deyi Xiong
ALM
42
17
0
28 Jun 2023
Bring Your Own Data! Self-Supervised Evaluation for Large Language Models
Neel Jain
Khalid Saifullah
Yuxin Wen
John Kirchenbauer
Manli Shu
Aniruddha Saha
Micah Goldblum
Jonas Geiping
Tom Goldstein
ALM
ELM
38
23
0
23 Jun 2023
Visual Adversarial Examples Jailbreak Aligned Large Language Models
Xiangyu Qi
Kaixuan Huang
Ashwinee Panda
Peter Henderson
Mengdi Wang
Prateek Mittal
AAML
30
138
0
22 Jun 2023
Mass-Producing Failures of Multimodal Systems with Language Models
Shengbang Tong
Erik Jones
Jacob Steinhardt
49
34
0
21 Jun 2023
TrustGPT: A Benchmark for Trustworthy and Responsible Large Language Models
Yue Huang
Qihui Zhang
Philip S. Y
Lichao Sun
26
46
0
20 Jun 2023
Give Us the Facts: Enhancing Large Language Models with Knowledge Graphs for Fact-aware Language Modeling
Lin F. Yang
Hongyang Chen
Zhao Li
Xiao Ding
Xindong Wu
KELM
40
87
0
20 Jun 2023
Towards Theory-based Moral AI: Moral AI with Aggregating Models Based on Normative Ethical Theory
Masashi Takeshita
Rafal Rzepka
K. Araki
37
8
0
20 Jun 2023
KEST: Kernel Distance Based Efficient Self-Training for Improving Controllable Text Generation
Yuxi Feng
Xiaoyuan Yi
L. Lakshmanan
Xing Xie
43
1
0
17 Jun 2023
Conformal Language Modeling
Victor Quach
Adam Fisch
Tal Schuster
Adam Yala
J. Sohn
Tommi Jaakkola
Regina Barzilay
85
56
0
16 Jun 2023
CHORUS: Foundation Models for Unified Data Discovery and Exploration
Moe Kayali
A. Lykov
Ilias Fountalis
N. Vasiloglou
Dan Olteanu
Dan Suciu
33
21
0
16 Jun 2023
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman
Zeerak Talat
William Agnew
Lama Ahmad
Dylan K. Baker
...
Marie-Therese Png
Shubham Singh
A. Strait
Lukas Struppek
Arjun Subramonian
ELM
EGVM
46
104
0
09 Jun 2023
Prompt Injection attack against LLM-integrated Applications
Yi Liu
Gelei Deng
Yuekang Li
Kailong Wang
Zihao Wang
...
Tianwei Zhang
Yepang Liu
Haoyu Wang
Yanhong Zheng
Yang Liu
SILM
47
320
0
08 Jun 2023
Long-form analogies generated by chatGPT lack human-like psycholinguistic properties
S. M. Seals
V. Shalin
24
11
0
07 Jun 2023
Click: Controllable Text Generation with Sequence Likelihood Contrastive Learning
Chujie Zheng
Pei Ke
Zheng Zhang
Minlie Huang
BDL
26
31
0
06 Jun 2023
AHA!: Facilitating AI Impact Assessment by Generating Examples of Harms
Zana Buçinca
Chau Minh Pham
Maurice Jakesch
Marco Tulio Ribeiro
Alexandra Olteanu
Saleema Amershi
33
35
0
05 Jun 2023
Structured Voronoi Sampling
Afra Amini
Li Du
Ryan Cotterell
DiffM
30
1
0
05 Jun 2023
On "Scientific Debt" in NLP: A Case for More Rigour in Language Model Pre-Training Research
Made Nindyatama Nityasya
Haryo Akbarianto Wibowo
Alham Fikri Aji
Genta Indra Winata
Radityo Eko Prasojo
Phil Blunsom
A. Kuncoro
27
8
0
05 Jun 2023
Exposing Bias in Online Communities through Large-Scale Language Models
Celine Wald
Lukas Pfahler
21
6
0
04 Jun 2023
Fine-Grained Human Feedback Gives Better Rewards for Language Model Training
Zeqiu Wu
Yushi Hu
Weijia Shi
Nouha Dziri
Alane Suhr
Prithviraj Ammanabrolu
Noah A. Smith
Mari Ostendorf
Hannaneh Hajishirzi
ALM
53
305
0
02 Jun 2023
CFL: Causally Fair Language Models Through Token-level Attribute Controlled Generation
Rahul Madhavan
Rishabh Garg
Kahini Wadhawan
S. Mehta
38
5
0
01 Jun 2023
Thought Cloning: Learning to Think while Acting by Imitating Human Thinking
Shengran Hu
Jeff Clune
LM&Ro
OffRL
LRM
AI4CE
45
27
0
01 Jun 2023
An Invariant Learning Characterization of Controlled Text Generation
Carolina Zheng
Claudia Shi
Keyon Vafa
Amir Feder
David M. Blei
OOD
38
8
0
31 May 2023
Controlled Text Generation with Hidden Representation Transformations
Vaibhav Kumar
H. Koorehdavoudi
Masud Moshtaghi
Amita Misra
Ankit Chadha
Emilio Ferrara
31
3
0
30 May 2023
KoSBi: A Dataset for Mitigating Social Bias Risks Towards Safer Large Language Model Application
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
Gunhee Kim
Jung-Woo Ha
40
29
0
28 May 2023
SQuARe: A Large-Scale Dataset of Sensitive Questions and Acceptable Responses Created Through Human-Machine Collaboration
Hwaran Lee
Seokhee Hong
Joonsuk Park
Takyoung Kim
M. Cha
...
Eun-Ju Lee
Yong Lim
Alice Oh
San-hee Park
Jung-Woo Ha
46
16
0
28 May 2023
Query-Efficient Black-Box Red Teaming via Bayesian Optimization
Deokjae Lee
JunYeong Lee
Jung-Woo Ha
Jin-Hwa Kim
Sang-Woo Lee
Hwaran Lee
Hyun Oh Song
AAML
29
23
0
27 May 2023
Previous
1
2
3
...
9
10
11
...
14
15
16
Next