Papers
Communities
Events
Blog
Pricing
Search
Open menu
Home
Papers
2112.04359
Cited By
Ethical and social risks of harm from Language Models
8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
PILM
Re-assign community
ArXiv (abs)
PDF
HTML
Papers citing
"Ethical and social risks of harm from Language Models"
50 / 634 papers shown
Title
RuCoLA: Russian Corpus of Linguistic Acceptability
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
80
29
0
23 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
95
19
0
21 Oct 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy
Emily Allaway
Melanie Subbiah
Lydia B. Chilton
D. Patton
Kathleen McKeown
William Yang Wang
96
45
0
18 Oct 2022
Deep Bidirectional Language-Knowledge Graph Pretraining
Michihiro Yasunaga
Antoine Bosselut
Hongyu Ren
Xikun Zhang
Christopher D. Manning
Percy Liang
J. Leskovec
101
204
0
17 Oct 2022
Prompting GPT-3 To Be Reliable
Chenglei Si
Zhe Gan
Zhengyuan Yang
Shuohang Wang
Jianfeng Wang
Jordan L. Boyd-Graber
Lijuan Wang
KELM
LRM
113
303
0
17 Oct 2022
Enabling Classifiers to Make Judgements Explicitly Aligned with Human Values
Yejin Bang
Tiezheng Yu
Andrea Madotto
Zhaojiang Lin
Mona T. Diab
Pascale Fung
74
13
0
14 Oct 2022
Unified Detoxifying and Debiasing in Language Generation via Inference-time Adaptive Optimization
Zonghan Yang
Xiaoyuan Yi
Peng Li
Yang Liu
Xing Xie
108
34
0
10 Oct 2022
Analogy Generation by Prompting Large Language Models: A Case Study of InstructGPT
B. Bhavya
Jinjun Xiong
Chengxiang Zhai
LRM
84
44
0
09 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
114
31
0
08 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
289
219
0
05 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
370
1,100
0
05 Oct 2022
When to Make Exceptions: Exploring Language Models as Accounts of Human Moral Judgment
Zhijing Jin
Sydney Levine
Fernando Gonzalez
Ojasv Kamal
Maarten Sap
Mrinmaya Sachan
Rada Mihalcea
J. Tenenbaum
Bernhard Schölkopf
ELM
LRM
96
103
0
04 Oct 2022
Co-Writing Screenplays and Theatre Scripts with Language Models: An Evaluation by Industry Professionals
Piotr Wojciech Mirowski
Kory W. Mathewson
Jaylen Pittman
Richard Evans
HAI
115
266
0
29 Sep 2022
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
324
537
0
28 Sep 2022
Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Swarnadeep Saha
Shiyue Zhang
Peter Hase
Joey Tianyi Zhou
106
20
0
21 Sep 2022
Exploiting Cultural Biases via Homoglyphs in Text-to-Image Synthesis
Lukas Struppek
Dominik Hintersdorf
Felix Friedrich
Manuel Brack
P. Schramowski
Kristian Kersting
121
32
0
19 Sep 2022
A Review of Challenges in Machine Learning based Automated Hate Speech Detection
Abhishek Velankar
H. Patil
Raviraj Joshi
77
9
0
12 Sep 2022
Harnessing Abstractive Summarization for Fact-Checked Claim Detection
Varad Bhatnagar
Diptesh Kanojia
Kameswari Chebrolu
HILM
73
8
0
10 Sep 2022
The Ethical Need for Watermarks in Machine-Generated Language
A. Grinbaum
Laurynas Adomaitis
WaLM
47
34
0
07 Sep 2022
In conversation with Artificial Intelligence: aligning language models with human values
Atoosa Kasirzadeh
Iason Gabriel
126
105
0
01 Sep 2022
Faithful Reasoning Using Large Language Models
Antonia Creswell
Murray Shanahan
ReLM
LRM
73
125
0
30 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
312
489
0
23 Aug 2022
Integrating Diverse Knowledge Sources for Online One-shot Learning of Novel Tasks
James R. Kirk
R. Wray
Peter Lindes
John E. Laird
66
8
0
19 Aug 2022
Is Your Model Sensitive? SPeDaC: A New Benchmark for Detecting and Classifying Sensitive Personal Data
Gaia Gambarelli
Aldo Gangemi
Rocco Tripodi
77
9
0
12 Aug 2022
Social Simulacra: Creating Populated Prototypes for Social Computing Systems
J. Park
Lindsay Popowski
Carrie J. Cai
Meredith Ringel Morris
Percy Liang
Michael S. Bernstein
85
298
0
08 Aug 2022
A Holistic Approach to Undesired Content Detection in the Real World
Todor Markov
Chong Zhang
Sandhini Agarwal
Tyna Eloundou
Teddy Lee
Steven Adler
Angela Jiang
L. Weng
125
237
0
05 Aug 2022
BlenderBot 3: a deployed conversational agent that continually learns to responsibly engage
Kurt Shuster
Jing Xu
M. Komeili
Da Ju
Eric Michael Smith
...
Naman Goyal
Arthur Szlam
Y-Lan Boureau
Melanie Kambadur
Jason Weston
LM&Ro
KELM
126
242
0
05 Aug 2022
A Hazard Analysis Framework for Code Synthesis Large Language Models
Heidy Khlaaf
Pamela Mishkin
Joshua Achiam
Gretchen Krueger
Miles Brundage
ELM
74
29
0
25 Jul 2022
The Birth of Bias: A case study on the evolution of gender bias in an English language model
Oskar van der Wal
Jaap Jumelet
K. Schulz
Willem H. Zuidema
121
16
0
21 Jul 2022
Democratizing Ethical Assessment of Natural Language Generation Models
A. Rasekh
Ian W. Eisenberg
ELM
49
1
0
30 Jun 2022
VisFIS: Visual Feature Importance Supervision with Right-for-the-Right-Reason Objectives
Zhuofan Ying
Peter Hase
Joey Tianyi Zhou
LRM
87
13
0
22 Jun 2022
The Fallacy of AI Functionality
Inioluwa Deborah Raji
Indra Elizabeth Kumar
Aaron Horowitz
Andrew D. Selbst
82
197
0
20 Jun 2022
Know your audience: specializing grounded language models with listener subtraction
Aaditya K. Singh
David Ding
Andrew M. Saxe
Felix Hill
Andrew Kyle Lampinen
65
2
0
16 Jun 2022
Characteristics of Harmful Text: Towards Rigorous Benchmarking of Language Models
Maribeth Rauh
John F. J. Mellor
J. Uesato
Po-Sen Huang
Johannes Welbl
...
Amelia Glaese
G. Irving
Iason Gabriel
William S. Isaac
Lisa Anne Hendricks
124
52
0
16 Jun 2022
Emergent Abilities of Large Language Models
Jason W. Wei
Yi Tay
Rishi Bommasani
Colin Raffel
Barret Zoph
...
Tatsunori Hashimoto
Oriol Vinyals
Percy Liang
J. Dean
W. Fedus
ELM
ReLM
LRM
320
2,524
0
15 Jun 2022
The Case for a Single Model that can Both Generate Continuations and Fill in the Blank
Daphne Ippolito
Liam Dugan
Emily Reif
Ann Yuan
Andy Coenen
Chris Callison-Burch
57
2
0
09 Jun 2022
Researching Alignment Research: Unsupervised Analysis
Jan H. Kirchner
Logan Smith
Jacques Thibodeau
Kyle McDonell
Laria Reynolds
54
7
0
06 Jun 2022
Findings of the The RuATD Shared Task 2022 on Artificial Text Detection in Russian
T. Shamardina
Vladislav Mikhailov
Daniil Chernianskii
Alena Fenogenova
Marat Saidov
A. Valeeva
Tatiana Shavrina
I. Smurov
E. Tutubalina
Ekaterina Artemova
DeLMO
62
30
0
03 Jun 2022
Language and Culture Internalisation for Human-Like Autotelic AI
Cédric Colas
Tristan Karch
Clément Moulin-Frier
Pierre-Yves Oudeyer
LM&Ro
98
27
0
02 Jun 2022
Algorithmic Fairness and Structural Injustice: Insights from Feminist Political Philosophy
Atoosa Kasirzadeh
FaML
82
41
0
02 Jun 2022
On Reinforcement Learning and Distribution Matching for Fine-Tuning Language Models with no Catastrophic Forgetting
Tomasz Korbak
Hady ElSahar
Germán Kruszewski
Marc Dymetman
CLL
102
57
0
01 Jun 2022
Chefs' Random Tables: Non-Trigonometric Random Features
Valerii Likhosherstov
K. Choromanski
Kumar Avinava Dubey
Frederick Liu
Tamás Sarlós
Adrian Weller
90
18
0
30 May 2022
Learning to Automate Follow-up Question Generation using Process Knowledge for Depression Triage on Reddit Posts
Shrey Gupta
Anmol Agarwal
Manas Gaur
Kaushik Roy
Vignesh Narayanan
Ponnurangam Kumaraguru
Amit P. Sheth
AI4MH
67
34
0
27 May 2022
ProsocialDialog: A Prosocial Backbone for Conversational Agents
Hyunwoo J. Kim
Youngjae Yu
Liwei Jiang
Ximing Lu
Daniel Khashabi
Gunhee Kim
Yejin Choi
Maarten Sap
112
128
0
25 May 2022
Conditional Supervised Contrastive Learning for Fair Text Classification
Jianfeng Chi
Will Shand
Yaodong Yu
Kai-Wei Chang
Han Zhao
Yuan Tian
FaML
77
14
0
23 May 2022
Looking for a Handsome Carpenter! Debiasing GPT-3 Job Advertisements
Conrad Borchers
Dalia Sara Gala
Ben Gilburt
Eduard Oravkin
Wilfried Bounsi
Yuki M. Asano
Hannah Rose Kirk
AI4CE
72
29
0
23 May 2022
Acceptability Judgements via Examining the Topology of Attention Maps
D. Cherniavskii
Eduard Tulchinskii
Vladislav Mikhailov
Irina Proskurina
Laida Kushnareva
Ekaterina Artemova
S. Barannikov
Irina Piontkovskaya
D. Piontkovski
Evgeny Burnaev
826
20
0
19 May 2022
Deconstructing NLG Evaluation: Evaluation Practices, Assumptions, and Their Implications
Kaitlyn Zhou
Su Lin Blodgett
Adam Trischler
Hal Daumé
Kaheer Suleman
Alexandra Olteanu
ELM
148
30
0
13 May 2022
A Generalist Agent
Scott E. Reed
Konrad Zolna
Emilio Parisotto
Sergio Gomez Colmenarejo
Alexander Novikov
...
Yutian Chen
R. Hadsell
Oriol Vinyals
Mahyar Bordbar
Nando de Freitas
LM&Ro
LLMAG
AI4CE
215
827
0
12 May 2022
Context-Aware Abbreviation Expansion Using Large Language Models
Shanqing Cai
Subhashini Venugopalan
Katrin Tomanek
Ajit Narayanan
Meredith Ringel Morris
Michael P. Brenner
66
28
0
08 May 2022
Previous
1
2
3
...
11
12
13
Next