Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2009.11462
Cited By
RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models
24 September 2020
Samuel Gehman
Suchin Gururangan
Maarten Sap
Yejin Choi
Noah A. Smith
Re-assign community
ArXiv
PDF
HTML
Papers citing
"RealToxicityPrompts: Evaluating Neural Toxic Degeneration in Language Models"
50 / 772 papers shown
Title
Generating Images with Multimodal Language Models
Jing Yu Koh
Daniel Fried
Ruslan Salakhutdinov
MLLM
38
243
0
26 May 2023
From Dogwhistles to Bullhorns: Unveiling Coded Rhetoric with Language Models
Julia Mendelsohn
Ronan Le Bras
Yejin Choi
Maarten Sap
34
25
0
26 May 2023
Training Socially Aligned Language Models on Simulated Social Interactions
Ruibo Liu
Ruixin Yang
Chenyan Jia
Ge Zhang
Denny Zhou
Andrew M. Dai
Diyi Yang
Soroush Vosoughi
ALM
37
46
0
26 May 2023
Scaling Data-Constrained Language Models
Niklas Muennighoff
Alexander M. Rush
Boaz Barak
Teven Le Scao
Aleksandra Piktus
Nouamane Tazi
S. Pyysalo
Thomas Wolf
Colin Raffel
ALM
40
201
0
25 May 2023
The False Promise of Imitating Proprietary LLMs
Arnav Gudibande
Eric Wallace
Charles Burton Snell
Xinyang Geng
Hao Liu
Pieter Abbeel
Sergey Levine
Dawn Song
ALM
44
199
0
25 May 2023
Inference-Time Policy Adapters (IPA): Tailoring Extreme-Scale LMs without Fine-tuning
Ximing Lu
Faeze Brahman
Peter West
Jaehun Jang
Khyathi Raghavi Chandu
...
Bill Yuchen Lin
Skyler Hallinan
Xiang Ren
Sean Welleck
Yejin Choi
30
26
0
24 May 2023
Tricking LLMs into Disobedience: Formalizing, Analyzing, and Detecting Jailbreaks
Abhinav Rao
S. Vashistha
Atharva Naik
Somak Aditya
Monojit Choudhury
40
17
0
24 May 2023
PURR: Efficiently Editing Language Model Hallucinations by Denoising Language Model Corruptions
Anthony Chen
Panupong Pasupat
Sameer Singh
Hongrae Lee
Kelvin Guu
32
40
0
24 May 2023
Debiasing should be Good and Bad: Measuring the Consistency of Debiasing Techniques in Language Models
Robert D Morabito
Jad Kabbara
Ali Emami
19
6
0
23 May 2023
ReSee: Responding through Seeing Fine-grained Visual Knowledge in Open-domain Dialogue
Haoqin Tu
Yitong Li
Fei Mi
Zhongliang Yang
46
4
0
23 May 2023
BiasX: "Thinking Slow" in Toxic Content Moderation with Explanations of Implied Social Biases
Yiming Zhang
Sravani Nanduri
Liwei Jiang
Tongshuang Wu
Maarten Sap
47
7
0
23 May 2023
A Pretrainer's Guide to Training Data: Measuring the Effects of Data Age, Domain Coverage, Quality, & Toxicity
Shayne Longpre
Gregory Yauney
Emily Reif
Katherine Lee
Adam Roberts
...
Denny Zhou
Jason W. Wei
Kevin Robinson
David M. Mimno
Daphne Ippolito
31
150
0
22 May 2023
Word Embeddings Are Steers for Language Models
Chi Han
Jialiang Xu
Manling Li
Yi R. Fung
Chenkai Sun
Nan Jiang
Tarek Abdelzaher
Heng Ji
LLMSV
37
29
0
22 May 2023
GraphCare: Enhancing Healthcare Predictions with Personalized Knowledge Graphs
Pengcheng Jiang
Cao Xiao
Adam Cross
Jimeng Sun
AI4MH
31
21
0
22 May 2023
This Prompt is Measuring <MASK>: Evaluating Bias Evaluation in Language Models
Seraphina Goldfarb-Tarrant
Eddie L. Ungless
Esma Balkir
Su Lin Blodgett
48
9
0
22 May 2023
Can We Edit Factual Knowledge by In-Context Learning?
Ce Zheng
Lei Li
Qingxiu Dong
Yuxuan Fan
Zhiyong Wu
Jingjing Xu
Baobao Chang
KELM
39
187
0
22 May 2023
BiasAsker: Measuring the Bias in Conversational AI System
Yuxuan Wan
Wenxuan Wang
Pinjia He
Jiazhen Gu
Haonan Bai
Michael Lyu
34
67
0
21 May 2023
BOLT: Fast Energy-based Controlled Text Generation with Tunable Biases
Xin Liu
Muhammad Khalifa
Lu Wang
41
18
0
19 May 2023
CRITIC: Large Language Models Can Self-Correct with Tool-Interactive Critiquing
Zhibin Gou
Zhihong Shao
Yeyun Gong
Yelong Shen
Yujiu Yang
Nan Duan
Weizhu Chen
KELM
LRM
36
360
0
19 May 2023
Data Redaction from Conditional Generative Models
Zhifeng Kong
Kamalika Chaudhuri
KELM
26
7
0
18 May 2023
ReGen: Zero-Shot Text Classification via Training Data Generation with Progressive Dense Retrieval
Yue Yu
Yuchen Zhuang
Rongzhi Zhang
Yu Meng
Jiaming Shen
Chao Zhang
VLM
43
33
0
18 May 2023
DoReMi: Optimizing Data Mixtures Speeds Up Language Model Pretraining
Sang Michael Xie
Hieu H. Pham
Xuanyi Dong
Nan Du
Hanxiao Liu
Yifeng Lu
Percy Liang
Quoc V. Le
Tengyu Ma
Adams Wei Yu
MoMe
MoE
61
180
0
17 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
128
1,152
0
17 May 2023
Language Model Tokenizers Introduce Unfairness Between Languages
Aleksandar Petrov
Emanuele La Malfa
Philip Torr
Adel Bibi
52
98
0
17 May 2023
Is ChatGPT Fair for Recommendation? Evaluating Fairness in Large Language Model Recommendation
Jizhi Zhang
Keqin Bao
Yang Zhang
Wenjie Wang
Fuli Feng
Xiangnan He
LRM
ALM
35
158
0
12 May 2023
StarCoder: may the source be with you!
Raymond Li
Loubna Ben Allal
Yangtian Zi
Niklas Muennighoff
Denis Kocetkov
...
Sean M. Hughes
Thomas Wolf
Arjun Guha
Leandro von Werra
H. D. Vries
64
730
0
09 May 2023
Semantic Space Grounded Weighted Decoding for Multi-Attribute Controllable Dialogue Generation
Zhiling Zhang
Mengyue Wu
Ke Zhu
AI4CE
35
1
0
04 May 2023
"Oops, Did I Just Say That?" Testing and Repairing Unethical Suggestions of Large Language Models with Suggest-Critique-Reflect Process
Anna Glazkova
Zongjie Li
Michael Kadantsev
Maksim Glazkov
KELM
51
14
0
04 May 2023
LaMini-LM: A Diverse Herd of Distilled Models from Large-Scale Instructions
Minghao Wu
Abdul Waheed
Chiyu Zhang
Muhammad Abdul-Mageed
Alham Fikri Aji
ALM
137
120
0
27 Apr 2023
We're Afraid Language Models Aren't Modeling Ambiguity
Alisa Liu
Zhaofeng Wu
Julian Michael
Alane Suhr
Peter West
Alexander Koller
Swabha Swayamdipta
Noah A. Smith
Yejin Choi
73
92
0
27 Apr 2023
The Parrot Dilemma: Human-Labeled vs. LLM-augmented Data in Classification Tasks
Anders Giovanni Møller
Jacob Aarup Dalsgaard
Arianna Pera
L. Aiello
81
35
0
26 Apr 2023
On the Challenges of Using Black-Box APIs for Toxicity Evaluation in Research
Luiza Amador Pozzobon
Beyza Ermis
Patrick Lewis
Sara Hooker
48
45
0
24 Apr 2023
A Group-Specific Approach to NLP for Hate Speech Detection
Karina Halevy
28
1
0
21 Apr 2023
Fundamental Limitations of Alignment in Large Language Models
Yotam Wolf
Noam Wies
Oshri Avnery
Yoav Levine
Amnon Shashua
ALM
19
141
0
19 Apr 2023
Towards Designing a ChatGPT Conversational Companion for Elderly People
Abeer Alessa
Hend Suliman Al-Khalifa
AI4MH
25
51
0
18 Apr 2023
Stochastic Parrots Looking for Stochastic Parrots: LLMs are Easy to Fine-Tune and Hard to Detect with other LLMs
Da Silva Gameiro Henrique
Andrei Kucharavy
R. Guerraoui
DeLMO
22
7
0
18 Apr 2023
Safer Conversational AI as a Source of User Delight
Xiaoding Lu
Aleksey Korshuk
Z. Liu
W. Beauchamp
Chai Research
36
3
0
18 Apr 2023
An Evaluation on Large Language Model Outputs: Discourse and Memorization
Adrian de Wynter
Xun Wang
Alex Sokolov
Qilong Gu
Si-Qing Chen
ELM
90
32
0
17 Apr 2023
RAFT: Reward rAnked FineTuning for Generative Foundation Model Alignment
Hanze Dong
Wei Xiong
Deepanshu Goyal
Yihan Zhang
Winnie Chow
Rui Pan
Shizhe Diao
Jipeng Zhang
Kashun Shum
Tong Zhang
ALM
18
410
0
13 Apr 2023
Shall We Pretrain Autoregressive Language Models with Retrieval? A Comprehensive Study
Wei Ping
Ming-Yu Liu
Peng Xu
Lawrence C. McAfee
Zihan Liu
...
Oleksii Kuchaiev
Bo Li
Chaowei Xiao
Anima Anandkumar
Bryan Catanzaro
RALM
46
56
0
13 Apr 2023
Toxicity in ChatGPT: Analyzing Persona-assigned Language Models
Ameet Deshpande
Vishvak Murahari
Tanmay Rajpurohit
Ashwin Kalyan
Karthik Narasimhan
LM&MA
LLMAG
35
338
0
11 Apr 2023
Interpretable Unified Language Checking
Tianhua Zhang
Hongyin Luo
Yung-Sung Chuang
Wei Fang
Luc Gaitskell
Thomas Hartvigsen
Xixin Wu
D. Fox
Helen M. Meng
James R. Glass
27
22
0
07 Apr 2023
Do the Rewards Justify the Means? Measuring Trade-Offs Between Rewards and Ethical Behavior in the MACHIAVELLI Benchmark
Alexander Pan
Chan Jun Shern
Andy Zou
Nathaniel Li
Steven Basart
Thomas Woodside
Jonathan Ng
Hanlin Zhang
Scott Emmons
Dan Hendrycks
37
127
0
06 Apr 2023
Large language models effectively leverage document-level context for literary translation, but critical errors persist
Marzena Karpinska
Mohit Iyyer
41
82
0
06 Apr 2023
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
35
42
0
31 Mar 2023
Training Language Models with Language Feedback at Scale
Jérémy Scheurer
Jon Ander Campos
Tomasz Korbak
Jun Shern Chan
Angelica Chen
Kyunghyun Cho
Ethan Perez
ALM
53
103
0
28 Mar 2023
Koala: An Index for Quantifying Overlaps with Pre-training Corpora
Thuy-Trang Vu
Xuanli He
Gholamreza Haffari
Ehsan Shareghi
CLL
27
13
0
26 Mar 2023
Large AI Models in Health Informatics: Applications, Challenges, and the Future
Jianing Qiu
Lin Li
Jiankai Sun
Jiachuan Peng
Peilun Shi
...
Bo Xiao
Wu Yuan
Ningli Wang
Dong Xu
Benny Lo
AI4MH
LM&MA
42
128
0
21 Mar 2023
Language Model Behavior: A Comprehensive Survey
Tyler A. Chang
Benjamin Bergen
VLM
LRM
LM&MA
32
103
0
20 Mar 2023
Eliciting Latent Predictions from Transformers with the Tuned Lens
Nora Belrose
Zach Furman
Logan Smith
Danny Halawi
Igor V. Ostrovsky
Lev McKinney
Stella Biderman
Jacob Steinhardt
27
196
0
14 Mar 2023
Previous
1
2
3
...
10
11
12
...
14
15
16
Next