ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04359
  4. Cited By
Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
    PILM
ArXiv (abs)PDFHTML

Papers citing "Ethical and social risks of harm from Language Models"

50 / 634 papers shown
Title
XAI meets LLMs: A Survey of the Relation between Explainable AI and
  Large Language Models
XAI meets LLMs: A Survey of the Relation between Explainable AI and Large Language Models
Min Zhang
Lorenzo Malandri
Fabio Mercorio
Navid Nobani
Andrea Seveso
92
15
0
21 Jul 2024
Operationalizing a Threat Model for Red-Teaming Large Language Models
  (LLMs)
Operationalizing a Threat Model for Red-Teaming Large Language Models (LLMs)
Apurv Verma
Satyapriya Krishna
Sebastian Gehrmann
Madhavan Seshadri
Anu Pradhan
Tom Ault
Leslie Barrett
David Rabinowitz
John Doucette
Nhathai Phan
129
15
0
20 Jul 2024
Consent in Crisis: The Rapid Decline of the AI Data Commons
Consent in Crisis: The Rapid Decline of the AI Data Commons
Shayne Longpre
Robert Mahari
Ariel N. Lee
Campbell Lund
Hamidah Oderinwale
...
Hanlin Li
Daphne Ippolito
Sara Hooker
Jad Kabbara
Sandy Pentland
125
42
0
20 Jul 2024
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by
  Giraffes and Acacias
Thorns and Algorithms: Navigating Generative AI Challenges Inspired by Giraffes and Acacias
Waqar Hussain
100
1
0
16 Jul 2024
Social and Ethical Risks Posed by General-Purpose LLMs for Settling
  Newcomers in Canada
Social and Ethical Risks Posed by General-Purpose LLMs for Settling Newcomers in Canada
I. Nejadgholi
Maryam Molamohammadi
Samir Bakhtawar
102
0
0
15 Jul 2024
Evaluating AI Evaluation: Perils and Prospects
Evaluating AI Evaluation: Perils and Prospects
John Burden
ELM
98
9
0
12 Jul 2024
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Are Large Language Models Really Bias-Free? Jailbreak Prompts for Assessing Adversarial Robustness to Bias Elicitation
Riccardo Cantini
Giada Cosenza
A. Orsino
Domenico Talia
AAML
126
7
0
11 Jul 2024
Limits to Predicting Online Speech Using Large Language Models
Limits to Predicting Online Speech Using Large Language Models
Mina Remeli
Moritz Hardt
Robert C. Williamson
51
0
0
08 Jul 2024
AI Safety in Generative AI Large Language Models: A Survey
AI Safety in Generative AI Large Language Models: A Survey
Jaymari Chua
Yun Yvonna Li
Shiyi Yang
Chen Wang
Lina Yao
LM&MA
100
19
0
06 Jul 2024
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
When LLMs Play the Telephone Game: Cultural Attractors as Conceptual Tools to Evaluate LLMs in Multi-turn Settings
Jérémy Perez
Corentin Léger
Grgur Kovač
Cédric Colas
Gaia Molinaro
Maxime Derex
Pierre-Yves Oudeyer
Clément Moulin-Frier
109
6
0
05 Jul 2024
The Price of Prompting: Profiling Energy Use in Large Language Models
  Inference
The Price of Prompting: Profiling Energy Use in Large Language Models Inference
E. J. Husom
Arda Goknil
Lwin Khin Shar
Sagar Sen
121
8
0
04 Jul 2024
Social Bias in Large Language Models For Bangla: An Empirical Study on
  Gender and Religious Bias
Social Bias in Large Language Models For Bangla: An Empirical Study on Gender and Religious Bias
Jayanta Sadhu
Maneesha Rani Saha
Rifat Shahriyar
83
4
0
03 Jul 2024
Self-Cognition in Large Language Models: An Exploratory Study
Self-Cognition in Large Language Models: An Exploratory Study
Dongping Chen
Jiawen Shi
Yao Wan
Pan Zhou
Neil Zhenqiang Gong
Lichao Sun
LRMLLMAG
86
4
0
01 Jul 2024
ProgressGym: Alignment with a Millennium of Moral Progress
ProgressGym: Alignment with a Millennium of Moral Progress
Tianyi Qiu
Yang Zhang
Xuchuan Huang
Jasmine Xinze Li
Yalan Qin
Yaodong Yang
AI4TS
106
7
0
28 Jun 2024
Suri: Multi-constraint Instruction Following for Long-form Text
  Generation
Suri: Multi-constraint Instruction Following for Long-form Text Generation
Chau Minh Pham
Simeng Sun
Mohit Iyyer
ALMLRM
124
23
0
27 Jun 2024
Fairness and Bias in Multimodal AI: A Survey
Fairness and Bias in Multimodal AI: A Survey
Tosin Adewumi
Lama Alkhaled
Namrata Gurung
G. V. Boven
Irene Pagliai
117
10
0
27 Jun 2024
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks,
  and Refusals of LLMs
WildGuard: Open One-Stop Moderation Tools for Safety Risks, Jailbreaks, and Refusals of LLMs
Seungju Han
Kavel Rao
Allyson Ettinger
Liwei Jiang
Bill Yuchen Lin
Nathan Lambert
Yejin Choi
Nouha Dziri
126
101
0
26 Jun 2024
AI Alignment through Reinforcement Learning from Human Feedback?
  Contradictions and Limitations
AI Alignment through Reinforcement Learning from Human Feedback? Contradictions and Limitations
Adam Dahlgren Lindstrom
Leila Methnani
Lea Krause
Petter Ericson
Ínigo Martínez de Rituerto de Troya
Dimitri Coelho Mollo
Roel Dobbe
ALM
81
2
0
26 Jun 2024
Natural Language but Omitted? On the Ineffectiveness of Large Language
  Models' privacy policy from End-users' Perspective
Natural Language but Omitted? On the Ineffectiveness of Large Language Models' privacy policy from End-users' Perspective
Shuning Zhang
Haobin Xing
Xin Yi
Hewu Li
PILM
108
0
0
26 Jun 2024
AI Risk Categorization Decoded (AIR 2024): From Government Regulations
  to Corporate Policies
AI Risk Categorization Decoded (AIR 2024): From Government Regulations to Corporate Policies
Yi Zeng
Kevin Klyman
Andy Zhou
Yu Yang
Minzhou Pan
Ruoxi Jia
Dawn Song
Percy Liang
Bo Li
98
27
0
25 Jun 2024
Adversaries Can Misuse Combinations of Safe Models
Adversaries Can Misuse Combinations of Safe Models
Erik Jones
Anca Dragan
Jacob Steinhardt
69
12
0
20 Jun 2024
PostMark: A Robust Blackbox Watermark for Large Language Models
PostMark: A Robust Blackbox Watermark for Large Language Models
Yapei Chang
Kalpesh Krishna
Amir Houmansadr
John Wieting
Mohit Iyyer
84
9
0
20 Jun 2024
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large
  Language Models
GenderAlign: An Alignment Dataset for Mitigating Gender Bias in Large Language Models
Tao Zhang
Huiping Zhuang
Yuxiang Xiao
Huiping Zhuang
Cen Chen
James R. Foulds
Shimei Pan
CVBM
80
5
0
20 Jun 2024
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Raising the Bar: Investigating the Values of Large Language Models via Generative Evolving Testing
Han Jiang
Xiaoyuan Yi
Zhihua Wei
Ziang Xiao
Shu Wang
Xing Xie
ELMALM
160
8
0
20 Jun 2024
Factual Confidence of LLMs: on Reliability and Robustness of Current
  Estimators
Factual Confidence of LLMs: on Reliability and Robustness of Current Estimators
Matéo Mahaut
Laura Aina
Paula Czarnowska
Momchil Hardalov
Thomas Müller
Lluís Marquez
HILM
99
24
0
19 Jun 2024
A Survey on Human Preference Learning for Large Language Models
A Survey on Human Preference Learning for Large Language Models
Ruili Jiang
Kehai Chen
Xuefeng Bai
Zhixuan He
Juntao Li
Muyun Yang
Tiejun Zhao
Liqiang Nie
Min Zhang
132
9
0
17 Jun 2024
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
SPA-VL: A Comprehensive Safety Preference Alignment Dataset for Vision Language Model
Yongting Zhang
Lu Chen
Guodong Zheng
Yifeng Gao
Rui Zheng
...
Yu Qiao
Xuanjing Huang
Feng Zhao
Tao Gui
Jing Shao
VLM
199
33
0
17 Jun 2024
Evaluation of Large Language Models: STEM education and Gender
  Stereotypes
Evaluation of Large Language Models: STEM education and Gender Stereotypes
Smilla Due
Sneha Das
Marianne Andersen
Berta Plandolit López
Sniff Andersen Nexø
Line Clemmensen
79
1
0
14 Jun 2024
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
CTIBench: A Benchmark for Evaluating LLMs in Cyber Threat Intelligence
Md Tanvirul Alam
Dipkamal Bhusal
Le Nguyen
Nidhi Rastogi
ELM
61
21
0
11 Jun 2024
A Synthetic Dataset for Personal Attribute Inference
A Synthetic Dataset for Personal Attribute Inference
Hanna Yukhymenko
Robin Staab
Mark Vero
Martin Vechev
SyDa
98
12
0
11 Jun 2024
Teaching Language Models to Self-Improve by Learning from Language
  Feedback
Teaching Language Models to Self-Improve by Learning from Language Feedback
Chi Hu
Yimin Hu
Hang Cao
Tong Xiao
Jingbo Zhu
LRMVLM
79
5
0
11 Jun 2024
Aligning Large Language Models with Representation Editing: A Control
  Perspective
Aligning Large Language Models with Representation Editing: A Control Perspective
Lingkai Kong
Haorui Wang
Wenhao Mu
Yuanqi Du
Yuchen Zhuang
Yifei Zhou
Yue Song
Rongzhi Zhang
Kai Wang
Chao Zhang
94
26
0
10 Jun 2024
Deconstructing The Ethics of Large Language Models from Long-standing
  Issues to New-emerging Dilemmas
Deconstructing The Ethics of Large Language Models from Long-standing Issues to New-emerging Dilemmas
Chengyuan Deng
Yiqun Duan
Xin Jin
Heng Chang
Yijun Tian
...
Kuofeng Gao
Sihong He
Jun Zhuang
Lu Cheng
Haohan Wang
AILaw
90
24
0
08 Jun 2024
LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
LlavaGuard: An Open VLM-based Framework for Safeguarding Vision Datasets and Models
Lukas Helff
Felix Friedrich
Manuel Brack
Kristian Kersting
P. Schramowski
VLM
108
1
0
07 Jun 2024
MoralBench: Moral Evaluation of LLMs
MoralBench: Moral Evaluation of LLMs
Jianchao Ji
Yutong Chen
Mingyu Jin
Wujiang Xu
Wenyue Hua
Yongfeng Zhang
ELM
75
13
0
06 Jun 2024
The Life Cycle of Large Language Models: A Review of Biases in Education
The Life Cycle of Large Language Models: A Review of Biases in Education
Jinsook Lee
Yann Hicke
Renzhe Yu
Christopher A. Brooks
René F. Kizilcec
AI4Ed
99
2
0
03 Jun 2024
Towards Trustworthy AI: A Review of Ethical and Robust Large Language
  Models
Towards Trustworthy AI: A Review of Ethical and Robust Large Language Models
Meftahul Ferdaus
Mahdi Abdelguerfi
Elias Ioup
Kendall N. Niles
Ken Pathak
Steve Sloan
128
14
0
01 Jun 2024
A Robot Walks into a Bar: Can Language Models Serve as Creativity
  Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with
  Comedians
A Robot Walks into a Bar: Can Language Models Serve as Creativity Support Tools for Comedy? An Evaluation of LLMs' Humour Alignment with Comedians
Piotr Wojciech Mirowski
Juliette Love
K. Mathewson
Shakir Mohamed
80
21
0
31 May 2024
Toxicity Detection for Free
Toxicity Detection for Free
Zhanhao Hu
Julien Piet
Geng Zhao
Jiantao Jiao
David Wagner
69
7
0
29 May 2024
Tool Learning with Large Language Models: A Survey
Tool Learning with Large Language Models: A Survey
Changle Qu
Sunhao Dai
Xiaochi Wei
Hengyi Cai
Shuaiqiang Wang
Dawei Yin
Jun Xu
Jirong Wen
LLMAG
101
107
0
28 May 2024
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Learning diverse attacks on large language models for robust red-teaming and safety tuning
Seanie Lee
Minsu Kim
Lynn Cherif
David Dobre
Juho Lee
...
Kenji Kawaguchi
Gauthier Gidel
Yoshua Bengio
Nikolay Malkin
Moksh Jain
AAML
158
20
0
28 May 2024
ReMoDetect: Reward Models Recognize Aligned LLM's Generations
ReMoDetect: Reward Models Recognize Aligned LLM's Generations
Hyunseok Lee
Jihoon Tack
Jinwoo Shin
DeLMO
61
1
0
27 May 2024
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
ChatGPT Code Detection: Techniques for Uncovering the Source of Code
Marc Oedingen
Raphael C. Engelhardt
Robin Denz
Maximilian Hammer
Wolfgang Konen
DeLMO
94
9
0
24 May 2024
Spectraformer: A Unified Random Feature Framework for Transformer
Spectraformer: A Unified Random Feature Framework for Transformer
Duke Nguyen
Du Yin
Aditya Joshi
Flora D. Salim
67
1
0
24 May 2024
Synthetic Data Generation for Intersectional Fairness by Leveraging
  Hierarchical Group Structure
Synthetic Data Generation for Intersectional Fairness by Leveraging Hierarchical Group Structure
Gaurav Maheshwari
A. Bellet
Pascal Denis
Mikaela Keller
88
1
0
23 May 2024
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Model Editing as a Robust and Denoised variant of DPO: A Case Study on Toxicity
Rheeya Uppaal
Apratim De
Yiting He
Yiquao Zhong
Junjie Hu
153
7
0
22 May 2024
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in
  LLMs
Skin-in-the-Game: Decision Making via Multi-Stakeholder Alignment in LLMs
Bilgehan Sel
Priya Shanmugasundaram
Mohammad Kachuee
Kun Zhou
Ruoxi Jia
Ming Jin
LRM
53
3
0
21 May 2024
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with
  Minimal Impact on Coherence and Evasiveness in Dialogue Agents
Adversarial DPO: Harnessing Harmful Data for Reducing Toxicity with Minimal Impact on Coherence and Evasiveness in Dialogue Agents
San Kim
Gary Geunbae Lee
AAML
124
3
0
21 May 2024
Can AI Relate: Testing Large Language Model Response for Mental Health
  Support
Can AI Relate: Testing Large Language Model Response for Mental Health Support
Saadia Gabriel
Isha Puri
Xuhai Xu
Matteo Malgaroli
Marzyeh Ghassemi
LM&MAAI4MH
117
15
0
20 May 2024
Sociotechnical Implications of Generative Artificial Intelligence for
  Information Access
Sociotechnical Implications of Generative Artificial Intelligence for Information Access
Bhaskar Mitra
Henriette Cramer
Olya Gurevich
113
2
0
19 May 2024
Previous
12345...111213
Next