Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2206.08325
Cited By
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
16 June 2022
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
Laura Weidinger
Sumanth Dathathri
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
Re-assign community
ArXiv
PDF
HTML
Papers citing
"Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models"
40 / 40 papers shown
Title
SAGE
\texttt{SAGE}
SAGE
: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
47
0
0
28 Apr 2025
Taxonomizing Representational Harms using Speech Act Theory
Emily Corvi
Hannah Washington
Stefanie Reed
Chad Atalla
Alexandra Chouldechova
...
Nicholas Pangakis
Emily Sheng
Dan Vann
Matthew Vogel
Hanna M. Wallach
45
0
0
01 Apr 2025
Multilingual != Multicultural: Evaluating Gaps Between Multilingual Capabilities and Cultural Alignment in LLMs
Jonathan Rystrøm
Hannah Rose Kirk
Scott A. Hale
44
2
0
23 Feb 2025
Evaluating the Propensity of Generative AI for Producing Harmful Disinformation During an Election Cycle
Erik J Schlicht
116
0
0
20 Jan 2025
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
30
1
0
17 Oct 2024
Reward-RAG: Enhancing RAG with Reward Driven Supervision
Thang Nguyen
Peter Chin
Yu-Wing Tai
RALM
42
4
0
03 Oct 2024
CoSafe: Evaluating Large Language Model Safety in Multi-Turn Dialogue Coreference
Erxin Yu
Jing Li
Ming Liao
Siqi Wang
Zuchen Gao
Fei Mi
Lanqing Hong
ELM
LRM
33
9
0
25 Jun 2024
STAR: SocioTechnical Approach to Red Teaming Language Models
Laura Weidinger
John F. J. Mellor
Bernat Guillen Pegueroles
Nahema Marchal
Ravin Kumar
...
Mark Diaz
Stevie Bergman
Mikel Rodriguez
Verena Rieser
William S. Isaac
VLM
42
7
0
17 Jun 2024
When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs
Ryo Kamoi
Yusen Zhang
Nan Zhang
Jiawei Han
Rui Zhang
LRM
50
57
0
03 Jun 2024
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Wojciech Mirowski
Juliette Love
K. Mathewson
Shakir Mohamed
32
20
0
31 May 2024
RTP-LX: Can LLMs Evaluate Toxicity in Multilingual Scenarios?
Adrian de Wynter
Ishaan Watts
Nektar Ege Altıntoprak
Tua Wongsangaroonsri
Minghui Zhang
...
Anna Vickers
Stéphanie Visser
Herdyan Widarmanto
A. Zaikin
Si-Qing Chen
LM&MA
54
16
0
22 Apr 2024
Token-level Direct Preference Optimization
Yongcheng Zeng
Guoqing Liu
Weiyu Ma
Ning Yang
Haifeng Zhang
Jun Wang
24
42
0
18 Apr 2024
GeniL: A Multilingual Dataset on Generalizing Language
Aida Mostafazadeh Davani
S. Gubbi
Sunipa Dev
Shachi Dave
Vinodkumar Prabhakaran
35
1
0
08 Apr 2024
GUARD-D-LLM: An LLM-Based Risk Assessment Engine for the Downstream uses of LLMs
Sundaraparipurnan Narayanan
Sandeep Vishwakarma
39
3
0
02 Apr 2024
Generative Echo Chamber? Effects of LLM-Powered Search Systems on Diverse Information Seeking
Nikhil Sharma
Q. V. Liao
Ziang Xiao
30
19
0
08 Feb 2024
Unveiling Safety Vulnerabilities of Large Language Models
George Kour
Marcel Zalmanovici
Naama Zwerdling
Esther Goldbraich
Ora Nova Fandina
Ateret Anaby-Tavor
Orna Raz
E. Farchi
AAML
24
15
0
07 Nov 2023
Safe RLHF: Safe Reinforcement Learning from Human Feedback
Josef Dai
Xuehai Pan
Ruiyang Sun
Jiaming Ji
Xinbo Xu
Mickel Liu
Yizhou Wang
Yaodong Yang
38
289
0
19 Oct 2023
Can LLM-Generated Misinformation Be Detected?
Canyu Chen
Kai Shu
DeLMO
36
158
0
25 Sep 2023
BeaverTails: Towards Improved Safety Alignment of LLM via a Human-Preference Dataset
Jiaming Ji
Mickel Liu
Juntao Dai
Xuehai Pan
Chi Zhang
Ce Bian
Chi Zhang
Ruiyang Sun
Yizhou Wang
Yaodong Yang
ALM
19
407
0
10 Jul 2023
Towards Measuring the Representation of Subjective Global Opinions in Language Models
Esin Durmus
Karina Nyugen
Thomas I. Liao
Nicholas Schiefer
Amanda Askell
...
Alex Tamkin
Janel Thamkul
Jared Kaplan
Jack Clark
Deep Ganguli
35
207
0
28 Jun 2023
Apolitical Intelligence? Auditing Delphi's responses on controversial political issues in the US
J. H. Rystrøm
19
0
0
22 Jun 2023
Conformal Language Modeling
Victor Quach
Adam Fisch
Tal Schuster
Adam Yala
J. Sohn
Tommi Jaakkola
Regina Barzilay
79
55
0
16 Jun 2023
Evaluating the Social Impact of Generative AI Systems in Systems and Society
Irene Solaiman
Zeerak Talat
William Agnew
Lama Ahmad
Dylan K. Baker
...
Marie-Therese Png
Shubham Singh
A. Strait
Lukas Struppek
Arjun Subramonian
ELM
EGVM
31
104
0
09 Jun 2023
Operationalising the Definition of General Purpose AI Systems: Assessing Four Approaches
Risto Uuk
C. I. Gutierrez
Alex Tamkin
26
2
0
05 Jun 2023
Appraising the Potential Uses and Harms of LLMs for Medical Systematic Reviews
Hye Sun Yun
Iain J. Marshall
T. Trikalinos
Byron C. Wallace
24
16
0
19 May 2023
PaLM 2 Technical Report
Rohan Anil
Andrew M. Dai
Orhan Firat
Melvin Johnson
Dmitry Lepikhin
...
Ce Zheng
Wei Zhou
Denny Zhou
Slav Petrov
Yonghui Wu
ReLM
LRM
110
1,148
0
17 May 2023
An Evaluation on Large Language Model Outputs: Discourse and Memorization
Adrian de Wynter
Xun Wang
Alex Sokolov
Qilong Gu
Si-Qing Chen
ELM
84
32
0
17 Apr 2023
Towards the Scalable Evaluation of Cooperativeness in Language Models
Alan Chan
Maxime Riché
Jesse Clifton
LLMAG
33
6
0
16 Mar 2023
Who's Thinking? A Push for Human-Centered Evaluation of LLMs using the XAI Playbook
Teresa Datta
John P. Dickerson
34
10
0
10 Mar 2023
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
48
194
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
45
158
0
15 Feb 2023
Solving math word problems with process- and outcome-based feedback
J. Uesato
Nate Kushman
Ramana Kumar
Francis Song
Noah Y. Siegel
L. Wang
Antonia Creswell
G. Irving
I. Higgins
FaML
ReLM
AIMat
LRM
42
290
0
25 Nov 2022
Language Generation Models Can Cause Harm: So What Can We Do About It? An Actionable Survey
Sachin Kumar
Vidhisha Balachandran
Lucille Njoo
Antonios Anastasopoulos
Yulia Tsvetkov
ELM
77
85
0
14 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Handling and Presenting Harmful Text in NLP Research
Hannah Rose Kirk
Abeba Birhane
Bertie Vidgen
Leon Derczynski
15
47
0
29 Apr 2022
BBQ: A Hand-Built Bias Benchmark for Question Answering
Alicia Parrish
Angelica Chen
Nikita Nangia
Vishakh Padmakumar
Jason Phang
Jana Thompson
Phu Mon Htut
Sam Bowman
223
367
0
15 Oct 2021
Challenges in Detoxifying Language Models
Johannes Welbl
Amelia Glaese
J. Uesato
Sumanth Dathathri
John F. J. Mellor
Lisa Anne Hendricks
Kirsty Anderson
Pushmeet Kohli
Ben Coppin
Po-Sen Huang
LM&MA
250
193
0
15 Sep 2021
Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP
Timo Schick
Sahana Udupa
Hinrich Schütze
259
374
0
28 Feb 2021
The Woman Worked as a Babysitter: On Biases in Language Generation
Emily Sheng
Kai-Wei Chang
Premkumar Natarajan
Nanyun Peng
223
616
0
03 Sep 2019
A Survey on Bias and Fairness in Machine Learning
Ninareh Mehrabi
Fred Morstatter
N. Saxena
Kristina Lerman
Aram Galstyan
SyDa
FaML
323
4,212
0
23 Aug 2019
1