Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.04359
Cited By
Ethical and social risks of harm from Language Models
8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Ethical and social risks of harm from Language Models"
50 / 634 papers shown
Title
MoralBERT: A Fine-Tuned Language Model for Capturing Moral Values in Social Discussions
Vjosa Preniqi
Iacopo Ghinassi
Julia Ive
C. Saitis
Kyriaki Kalimeri
72
7
0
12 Mar 2024
Aligners: Decoupling LLMs and Alignment
Lilian Ngweta
Mayank Agarwal
Subha Maity
Alex Gittens
Yuekai Sun
Mikhail Yurochkin
63
2
0
07 Mar 2024
On the Essence and Prospect: An Investigation of Alignment Approaches for Big Models
Xinpeng Wang
Shitong Duan
Xiaoyuan Yi
Jing Yao
Shanlin Zhou
Zhihua Wei
Peng Zhang
Dongkuan Xu
Maosong Sun
Xing Xie
OffRL
106
17
0
07 Mar 2024
Enhancing Instructional Quality: Leveraging Computer-Assisted Textual Analysis to Generate In-Depth Insights from Educational Artifacts
Zewei Tian
Min Sun
Alex Liu
Shawon Sarkar
Jing Liu
71
5
0
06 Mar 2024
Negating Negatives: Alignment without Human Positive Samples via Distributional Dispreference Optimization
Shitong Duan
Xiaoyuan Yi
Peng Zhang
Tun Lu
Xing Xie
Ning Gu
71
6
0
06 Mar 2024
Best of Both Worlds: A Pliable and Generalizable Neuro-Symbolic Approach for Relation Classification
Robert Vacareanu
F. Alam
M. Islam
Haris Riaz
Mihai Surdeanu
NAI
76
2
0
05 Mar 2024
Theoretical Insights for Diffusion Guidance: A Case Study for Gaussian Mixture Models
Yuchen Wu
Minshuo Chen
Zihao Li
Mengdi Wang
Yuting Wei
110
29
0
03 Mar 2024
Gender Bias in Large Language Models across Multiple Languages
Jinman Zhao
Yitian Ding
Chen Jia
Yining Wang
Zifan Qian
72
32
0
01 Mar 2024
Counterspeakers' Perspectives: Unveiling Barriers and AI Needs in the Fight against Online Hate
Jimin Mun
Cathy Buerger
Jenny T Liang
Joshua Garland
Maarten Sap
77
11
0
29 Feb 2024
Is Crowdsourcing Breaking Your Bank? Cost-Effective Fine-Tuning of Pre-trained Language Models with Proximal Policy Optimization
Shuo Yang
Gjergji Kasneci
ALM
57
3
0
28 Feb 2024
On the Challenges and Opportunities in Generative AI
Laura Manduchi
Kushagra Pandey
Robert Bamler
Ryan Cotterell
Sina Daubener
...
F. Wenzel
Frank Wood
Stephan Mandt
Vincent Fortuin
Vincent Fortuin
284
22
0
28 Feb 2024
Determinants of LLM-assisted Decision-Making
Eva Eigner
Thorsten Händler
109
49
0
27 Feb 2024
Speak Out of Turn: Safety Vulnerability of Large Language Models in Multi-turn Dialogue
Zhenhong Zhou
Jiuyang Xiang
Haopeng Chen
Quan Liu
Zherui Li
Sen Su
102
25
0
27 Feb 2024
From COBIT to ISO 42001: Evaluating Cybersecurity Frameworks for Opportunities, Risks, and Regulatory Compliance in Commercializing Large Language Models
Timothy R. McIntosh
Teo Susnjak
Tong Liu
Paul Watters
Raza Nowrozy
Malka N. Halgamuge
ELM
65
32
0
24 Feb 2024
Farsight: Fostering Responsible AI Awareness During AI Application Prototyping
Zijie J. Wang
Chinmay Kulkarni
Lauren Wilcox
Michael Terry
Michael A. Madaio
77
50
0
23 Feb 2024
CloChat: Understanding How People Customize, Interact, and Experience Personas in Large Language Models
Juhye Ha
Hyeon Jeon
DaEun Han
Jinwook Seo
Changhoon Oh
75
31
0
23 Feb 2024
Large Language Models are Vulnerable to Bait-and-Switch Attacks for Generating Harmful Content
Federico Bianchi
James Zou
74
5
0
21 Feb 2024
Exploring ChatGPT and its Impact on Society
Md. Asraful Haque
Shuai Li
SILM
112
29
0
21 Feb 2024
SaGE: Evaluating Moral Consistency in Large Language Models
Vamshi Krishna Bonagiri
Sreeram Vennam
Priyanshul Govil
Ponnurangam Kumaraguru
Manas Gaur
ELM
85
0
0
21 Feb 2024
Smaug: Fixing Failure Modes of Preference Optimisation with DPO-Positive
Arka Pal
Deep Karkhanis
Samuel Dooley
Manley Roberts
Siddartha Naidu
Colin White
OSLM
106
155
0
20 Feb 2024
Soft Self-Consistency Improves Language Model Agents
Han Wang
Archiki Prasad
Elias Stengel-Eskin
Mohit Bansal
LLMAG
137
11
0
20 Feb 2024
A Survey on Knowledge Distillation of Large Language Models
Xiaohan Xu
Ming Li
Chongyang Tao
Tao Shen
Reynold Cheng
Jinyang Li
Can Xu
Dacheng Tao
Dinesh Manocha
KELM
VLM
173
133
0
20 Feb 2024
Investigating the Impact of Model Instability on Explanations and Uncertainty
Sara Vera Marjanović
Isabelle Augenstein
Christina Lioma
AAML
80
0
0
20 Feb 2024
PromptKD: Distilling Student-Friendly Knowledge for Generative Language Models via Prompt Tuning
Gyeongman Kim
Doohyuk Jang
Eunho Yang
VLM
108
13
0
20 Feb 2024
Simulacra as Conscious Exotica
Murray Shanahan
LM&Ro
78
10
0
19 Feb 2024
ArtPrompt: ASCII Art-based Jailbreak Attacks against Aligned LLMs
Fengqing Jiang
Zhangchen Xu
Luyao Niu
Zhen Xiang
Bhaskar Ramasubramanian
Bo Li
Radha Poovendran
148
109
0
19 Feb 2024
A Note on Bias to Complete
Jia Xu
Mona Diab
123
2
0
18 Feb 2024
Stumbling Blocks: Stress Testing the Robustness of Machine-Generated Text Detectors Under Attacks
Yichen Wang
Shangbin Feng
Abe Bohan Hou
Xiao Pu
Chao Shen
Xiaoming Liu
Yulia Tsvetkov
Tianxing He
DeLMO
113
20
0
18 Feb 2024
Materiality and Risk in the Age of Pervasive AI Sensors
Matthew P. Stewart
Emanuel Moss
Pete Warden
Brian Plancher
Susan Kennedy
Mona Sloane
Vijay Janapa Reddi
52
4
0
17 Feb 2024
Can LLMs Speak For Diverse People? Tuning LLMs via Debate to Generate Controllable Controversial Statements
Ming Li
Jiuhai Chen
Lichang Chen
Dinesh Manocha
143
21
0
16 Feb 2024
Aligning Crowd Feedback via Distributional Preference Reward Modeling
Dexun Li
Cong Zhang
Kuicai Dong
Derrick-Goh-Xin Deik
Ruiming Tang
Yong Liu
97
17
0
15 Feb 2024
PAL: Proxy-Guided Black-Box Attack on Large Language Models
Chawin Sitawarin
Norman Mu
David Wagner
Alexandre Araujo
ELM
81
35
0
15 Feb 2024
Attacks, Defenses and Evaluations for LLM Conversation Safety: A Survey
Zhichen Dong
Zhanhui Zhou
Chao Yang
Jing Shao
Yu Qiao
ELM
132
68
0
14 Feb 2024
SafeDecoding: Defending against Jailbreak Attacks via Safety-Aware Decoding
Zhangchen Xu
Fengqing Jiang
Luyao Niu
Jinyuan Jia
Bill Yuchen Lin
Radha Poovendran
AAML
190
111
0
14 Feb 2024
Whispers in the Machine: Confidentiality in LLM-integrated Systems
Jonathan Evertz
Merlin Chlosta
Lea Schonherr
Thorsten Eisenhofer
119
21
0
10 Feb 2024
Feedback Loops With Language Models Drive In-Context Reward Hacking
Alexander Pan
Erik Jones
Meena Jagadeesan
Jacob Steinhardt
KELM
98
33
0
09 Feb 2024
An Examination on the Effectiveness of Divide-and-Conquer Prompting in Large Language Models
Yizhou Zhang
Lun Du
Defu Cao
Qiang Fu
Yan Liu
LRM
121
8
0
08 Feb 2024
Permute-and-Flip: An optimally stable and watermarkable decoder for LLMs
Xuandong Zhao
Lei Li
Yu-Xiang Wang
91
9
0
08 Feb 2024
On the Emergence of Cross-Task Linearity in the Pretraining-Finetuning Paradigm
Zhanpeng Zhou
Zijun Chen
Yilan Chen
Bo Zhang
Junchi Yan
MoMe
100
11
0
06 Feb 2024
Progress and Opportunities of Foundation Models in Bioinformatics
Qing Li
Zhihang Hu
Yixuan Wang
Lei Li
Yimin Fan
Irwin King
Le Song
Yu Li
AI4CE
85
18
0
06 Feb 2024
Toward Human-AI Alignment in Large-Scale Multi-Player Games
Sugandha Sharma
Guy Davidson
Khimya Khetarpal
Anssi Kanervisto
Udit Arora
Katja Hofmann
Ida Momennejad
76
0
0
05 Feb 2024
MAGDi: Structured Distillation of Multi-Agent Interaction Graphs Improves Reasoning in Smaller Language Models
Justin Chih-Yao Chen
Swarnadeep Saha
Elias Stengel-Eskin
Mohit Bansal
LRM
LLMAG
76
22
0
02 Feb 2024
Trustworthy Distributed AI Systems: Robustness, Privacy, and Governance
Wenqi Wei
Ling Liu
121
20
0
02 Feb 2024
A Personalized Framework for Consumer and Producer Group Fairness Optimization in Recommender Systems
Hossein A. Rahmani
Mohammadmehdi Naghiaei
Yashar Deldjoo
FaML
70
6
0
01 Feb 2024
On Prompt-Driven Safeguarding for Large Language Models
Chujie Zheng
Fan Yin
Hao Zhou
Fandong Meng
Jie Zhou
Kai-Wei Chang
Minlie Huang
Nanyun Peng
AAML
124
63
0
31 Jan 2024
Security and Privacy Challenges of Large Language Models: A Survey
B. Das
M. H. Amini
Yanzhao Wu
PILM
ELM
130
145
0
30 Jan 2024
ReGAL: Refactoring Programs to Discover Generalizable Abstractions
Elias Stengel-Eskin
Archiki Prasad
Mohit Bansal
67
15
0
29 Jan 2024
Fortifying Ethical Boundaries in AI: Advanced Strategies for Enhancing Security in Large Language Models
Yunhong He
Jianling Qiu
Wei Zhang
Zhe Yuan
61
3
0
27 Jan 2024
Design Principles for Generative AI Applications
Justin D. Weisz
Jessica He
Michael J. Muller
Gabriela Hoefer
Rachel Miles
Werner Geyer
AI4CE
92
138
0
25 Jan 2024
Black-Box Access is Insufficient for Rigorous AI Audits
Stephen Casper
Carson Ezell
Charlotte Siegmann
Noam Kolt
Taylor Lynn Curtis
...
Michael Gerovitch
David Bau
Max Tegmark
David M. Krueger
Dylan Hadfield-Menell
AAML
145
94
0
25 Jan 2024
Previous
1
2
3
...
5
6
7
...
11
12
13
Next