ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04359
  4. Cited By
Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
    PILM
ArXivPDFHTML

Papers citing "Ethical and social risks of harm from Language Models"

50 / 193 papers shown
Title
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to
  Guardrail Models for Virtual Assistants
CONSCENDI: A Contrastive and Scenario-Guided Distillation Approach to Guardrail Models for Virtual Assistants
A. Sun
Varun Nair
Elliot Schumacher
Anitha Kannan
29
3
0
27 Apr 2023
The MiniPile Challenge for Data-Efficient Language Models
The MiniPile Challenge for Data-Efficient Language Models
Jean Kaddour
MoE
ALM
24
40
0
17 Apr 2023
chatClimate: Grounding Conversational AI in Climate Science
chatClimate: Grounding Conversational AI in Climate Science
S. Vaghefi
Qian Wang
V. Muccione
Jingwei Ni
Mathias Kraus
...
Tobias Schimanski
Chiara Colesanti-Senni
Nicolas Webersinke
Christrian Huggel
Markus Leippold
KELM
AI4MH
HILM
25
67
0
11 Apr 2023
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language
  Models
Should ChatGPT be Biased? Challenges and Risks of Bias in Large Language Models
Emilio Ferrara
SILM
36
247
0
07 Apr 2023
REFINER: Reasoning Feedback on Intermediate Representations
REFINER: Reasoning Feedback on Intermediate Representations
Debjit Paul
Mete Ismayilzada
Maxime Peyrard
Beatriz Borges
Antoine Bosselut
Robert West
Boi Faltings
ReLM
LRM
26
171
0
04 Apr 2023
Scientists' Perspectives on the Potential for Generative AI in their
  Fields
Scientists' Perspectives on the Potential for Generative AI in their Fields
Meredith Ringel Morris
AI4CE
30
38
0
04 Apr 2023
Assessing Language Model Deployment with Risk Cards
Assessing Language Model Deployment with Risk Cards
Leon Derczynski
Hannah Rose Kirk
Vidhisha Balachandran
Sachin Kumar
Yulia Tsvetkov
M. Leiser
Saif Mohammad
28
42
0
31 Mar 2023
GPT-4 can pass the Korean National Licensing Examination for Korean
  Medicine Doctors
GPT-4 can pass the Korean National Licensing Examination for Korean Medicine Doctors
Dongyeop Jang
Tae-Rim Yun
Choong-Yeol Lee
Young-Kyu Kwon
Chang-Eop Kim
ELM
LM&MA
32
26
0
31 Mar 2023
BloombergGPT: A Large Language Model for Finance
BloombergGPT: A Large Language Model for Finance
Shijie Wu
Ozan Irsoy
Steven Lu
Vadim Dabravolski
Mark Dredze
Sebastian Gehrmann
P. Kambadur
David S. Rosenberg
Gideon Mann
AIFin
76
786
0
30 Mar 2023
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of
  Large Language Models
GPTs are GPTs: An Early Look at the Labor Market Impact Potential of Large Language Models
Tyna Eloundou
Sam Manning
Pamela Mishkin
Daniel Rock
ELM
41
380
0
17 Mar 2023
Computational-level Analysis of Constraint Compliance for General
  Intelligence
Computational-level Analysis of Constraint Compliance for General Intelligence
R. Wray
Steven J. Jones
John E. Laird
AI4CE
11
2
0
08 Mar 2023
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Spacerini: Plug-and-play Search Engines with Pyserini and Hugging Face
Christopher Akiki
Odunayo Ogundepo
Aleksandra Piktus
Xinyu Crystina Zhang
Akintunde Oladipo
Jimmy J. Lin
Martin Potthast
25
5
0
28 Feb 2023
Not what you've signed up for: Compromising Real-World LLM-Integrated
  Applications with Indirect Prompt Injection
Not what you've signed up for: Compromising Real-World LLM-Integrated Applications with Indirect Prompt Injection
Kai Greshake
Sahar Abdelnabi
Shailesh Mishra
C. Endres
Thorsten Holz
Mario Fritz
SILM
49
436
0
23 Feb 2023
Auditing large language models: a three-layered approach
Auditing large language models: a three-layered approach
Jakob Mokander
Jonas Schuett
Hannah Rose Kirk
Luciano Floridi
AILaw
MLAU
48
194
0
16 Feb 2023
Retrieval-augmented Image Captioning
Retrieval-augmented Image Captioning
R. Ramos
Desmond Elliott
Bruno Martins
VLM
32
29
0
16 Feb 2023
The Capacity for Moral Self-Correction in Large Language Models
The Capacity for Moral Self-Correction in Large Language Models
Deep Ganguli
Amanda Askell
Nicholas Schiefer
Thomas I. Liao
Kamil.e Lukovsiut.e
...
Tom B. Brown
C. Olah
Jack Clark
Sam Bowman
Jared Kaplan
LRM
ReLM
45
158
0
15 Feb 2023
Machine Learning Model Attribution Challenge
Machine Learning Model Attribution Challenge
Elizabeth Merkhofe
Deepesh Chaudhari
Hyrum S. Anderson
Keith Manville
Lily Wong
João Gante
13
4
0
13 Feb 2023
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard
  Security Attacks
Exploiting Programmatic Behavior of LLMs: Dual-Use Through Standard Security Attacks
Daniel Kang
Xuechen Li
Ion Stoica
Carlos Guestrin
Matei A. Zaharia
Tatsunori Hashimoto
AAML
24
234
0
11 Feb 2023
The Gradient of Generative AI Release: Methods and Considerations
The Gradient of Generative AI Release: Methods and Considerations
Irene Solaiman
33
98
0
05 Feb 2023
Mnemosyne: Learning to Train Transformers with Transformers
Mnemosyne: Learning to Train Transformers with Transformers
Deepali Jain
K. Choromanski
Kumar Avinava Dubey
Sumeet Singh
Vikas Sindhwani
Tingnan Zhang
Jie Tan
OffRL
39
9
0
02 Feb 2023
Debiasing Vision-Language Models via Biased Prompts
Debiasing Vision-Language Models via Biased Prompts
Ching-Yao Chuang
Varun Jampani
Yuanzhen Li
Antonio Torralba
Stefanie Jegelka
VLM
30
96
0
31 Jan 2023
Second Thoughts are Best: Learning to Re-Align With Human Values from
  Text Edits
Second Thoughts are Best: Learning to Re-Align With Human Values from Text Edits
Ruibo Liu
Chenyan Jia
Ge Zhang
Ziyu Zhuang
Tony X. Liu
Soroush Vosoughi
99
35
0
01 Jan 2023
Inclusive Artificial Intelligence
Inclusive Artificial Intelligence
Dilip Arumugam
Shi Dong
Benjamin Van Roy
52
1
0
24 Dec 2022
Methodological reflections for AI alignment research using human
  feedback
Methodological reflections for AI alignment research using human feedback
Thilo Hagendorff
Sarah Fabi
19
6
0
22 Dec 2022
Improving Cross-task Generalization of Unified Table-to-text Models with
  Compositional Task Configurations
Improving Cross-task Generalization of Unified Table-to-text Models with Compositional Task Configurations
Jifan Chen
Yuhao Zhang
Lan Liu
Rui Dong
Xinchi Chen
Patrick K. L. Ng
William Yang Wang
Zhiheng Huang
AI4CE
30
4
0
17 Dec 2022
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text
  Generation
MURMUR: Modular Multi-Step Reasoning for Semi-Structured Data-to-Text Generation
Swarnadeep Saha
Xinyan Velocity Yu
Joey Tianyi Zhou
Ramakanth Pasunuru
Asli Celikyilmaz
ReLM
LRM
25
10
0
16 Dec 2022
Manifestations of Xenophobia in AI Systems
Manifestations of Xenophobia in AI Systems
Nenad Tomašev
J. L. Maynard
Iason Gabriel
24
9
0
15 Dec 2022
Implicit causality in GPT-2: a case study
Implicit causality in GPT-2: a case study
H. Huynh
T. Lentz
Emiel van Miltenburg
LRM
27
3
0
08 Dec 2022
Discovering Latent Knowledge in Language Models Without Supervision
Discovering Latent Knowledge in Language Models Without Supervision
Collin Burns
Haotian Ye
Dan Klein
Jacob Steinhardt
70
327
0
07 Dec 2022
Talking About Large Language Models
Talking About Large Language Models
Murray Shanahan
AI4CE
23
242
0
07 Dec 2022
Fine-tuning language models to find agreement among humans with diverse
  preferences
Fine-tuning language models to find agreement among humans with diverse preferences
Michiel A. Bakker
Martin Chadwick
Hannah R. Sheahan
Michael Henry Tessler
Lucy Campbell-Gillingham
...
Nat McAleese
Amelia Glaese
John Aslanides
M. Botvinick
Christopher Summerfield
ALM
46
215
0
28 Nov 2022
Melting Pot 2.0
Melting Pot 2.0
J. Agapiou
A. Vezhnevets
Edgar A. Duénez-Guzmán
Jayd Matyas
Yiran Mao
...
Sukhdeep Singh
Julia Haas
Igor Mordatch
D. Mobbs
Joel Z Leibo
33
31
0
24 Nov 2022
AutoReply: Detecting Nonsense in Dialogue Introspectively with
  Discriminative Replies
AutoReply: Detecting Nonsense in Dialogue Introspectively with Discriminative Replies
Weiyan Shi
Emily Dinan
Adithya Renduchintala
Daniel Fried
Athul Paul Jacob
Zhou Yu
M. Lewis
AAML
28
2
0
22 Nov 2022
Ignore Previous Prompt: Attack Techniques For Language Models
Ignore Previous Prompt: Attack Techniques For Language Models
Fábio Perez
Ian Ribeiro
SILM
51
397
0
17 Nov 2022
Easily Accessible Text-to-Image Generation Amplifies Demographic
  Stereotypes at Large Scale
Easily Accessible Text-to-Image Generation Amplifies Demographic Stereotypes at Large Scale
Federico Bianchi
Pratyusha Kalluri
Esin Durmus
Faisal Ladhak
Myra Cheng
Debora Nozza
Tatsunori Hashimoto
Dan Jurafsky
James Zou
Aylin Caliskan
DiffM
VLM
36
288
0
07 Nov 2022
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for
  Text Generation and Modular Control
SSD-LM: Semi-autoregressive Simplex-based Diffusion Language Model for Text Generation and Modular Control
Xiaochuang Han
Sachin Kumar
Yulia Tsvetkov
35
79
0
31 Oct 2022
RuCoLA: Russian Corpus of Linguistic Acceptability
RuCoLA: Russian Corpus of Linguistic Acceptability
Vladislav Mikhailov
T. Shamardina
Max Ryabinin
A. Pestova
I. Smurov
Ekaterina Artemova
30
28
0
23 Oct 2022
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal
  Modeling
LiteVL: Efficient Video-Language Learning with Enhanced Spatial-Temporal Modeling
Dongsheng Chen
Chaofan Tao
Lu Hou
Lifeng Shang
Xin Jiang
Qun Liu
VLM
29
18
0
21 Oct 2022
SafeText: A Benchmark for Exploring Physical Safety in Language Models
SafeText: A Benchmark for Exploring Physical Safety in Language Models
Sharon Levy
Emily Allaway
Melanie Subbiah
Lydia B. Chilton
D. Patton
Kathleen McKeown
William Yang Wang
59
40
0
18 Oct 2022
Deep Bidirectional Language-Knowledge Graph Pretraining
Deep Bidirectional Language-Knowledge Graph Pretraining
Michihiro Yasunaga
Antoine Bosselut
Hongyu Ren
Xikun Zhang
Christopher D. Manning
Percy Liang
J. Leskovec
36
193
0
17 Oct 2022
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of
  Large-Scale Pre-Trained Language Models
AlphaTuning: Quantization-Aware Parameter-Efficient Adaptation of Large-Scale Pre-Trained Language Models
S. Kwon
Jeonghoon Kim
Jeongin Bae
Kang Min Yoo
Jin-Hwa Kim
Baeseong Park
Byeongwook Kim
Jung-Woo Ha
Nako Sung
Dongsoo Lee
MQ
29
30
0
08 Oct 2022
Ask Me Anything: A simple strategy for prompting language models
Ask Me Anything: A simple strategy for prompting language models
Simran Arora
A. Narayan
Mayee F. Chen
Laurel J. Orr
Neel Guha
Kush S. Bhatia
Ines Chami
Frederic Sala
Christopher Ré
ReLM
LRM
223
208
0
05 Oct 2022
GLM-130B: An Open Bilingual Pre-trained Model
GLM-130B: An Open Bilingual Pre-trained Model
Aohan Zeng
Xiao Liu
Zhengxiao Du
Zihan Wang
Hanyu Lai
...
Jidong Zhai
Wenguang Chen
Peng-Zhen Zhang
Yuxiao Dong
Jie Tang
BDL
LRM
253
1,073
0
05 Oct 2022
Improving alignment of dialogue agents via targeted human judgements
Improving alignment of dialogue agents via targeted human judgements
Amelia Glaese
Nat McAleese
Maja Trkebacz
John Aslanides
Vlad Firoiu
...
John F. J. Mellor
Demis Hassabis
Koray Kavukcuoglu
Lisa Anne Hendricks
G. Irving
ALM
AAML
227
502
0
28 Sep 2022
Summarization Programs: Interpretable Abstractive Summarization with
  Neural Modular Trees
Summarization Programs: Interpretable Abstractive Summarization with Neural Modular Trees
Swarnadeep Saha
Shiyue Zhang
Peter Hase
Joey Tianyi Zhou
26
19
0
21 Sep 2022
A Review of Challenges in Machine Learning based Automated Hate Speech
  Detection
A Review of Challenges in Machine Learning based Automated Hate Speech Detection
Abhishek Velankar
H. Patil
Raviraj Joshi
32
8
0
12 Sep 2022
Harnessing Abstractive Summarization for Fact-Checked Claim Detection
Harnessing Abstractive Summarization for Fact-Checked Claim Detection
Varad Bhatnagar
Diptesh Kanojia
Kameswari Chebrolu
HILM
27
8
0
10 Sep 2022
In conversation with Artificial Intelligence: aligning language models
  with human values
In conversation with Artificial Intelligence: aligning language models with human values
Atoosa Kasirzadeh
Iason Gabriel
21
98
0
01 Sep 2022
Faithful Reasoning Using Large Language Models
Faithful Reasoning Using Large Language Models
Antonia Creswell
Murray Shanahan
ReLM
LRM
24
121
0
30 Aug 2022
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors,
  and Lessons Learned
Red Teaming Language Models to Reduce Harms: Methods, Scaling Behaviors, and Lessons Learned
Deep Ganguli
Liane Lovitt
John Kernion
Amanda Askell
Yuntao Bai
...
Nicholas Joseph
Sam McCandlish
C. Olah
Jared Kaplan
Jack Clark
231
446
0
23 Aug 2022
Previous
1234
Next