ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04359
  4. Cited By
Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
    PILM
ArXiv (abs)PDFHTML

Papers citing "Ethical and social risks of harm from Language Models"

50 / 634 papers shown
Title
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
In-House Evaluation Is Not Enough: Towards Robust Third-Party Flaw Disclosure for General-Purpose AI
Shayne Longpre
Kevin Klyman
Ruth E. Appel
Sayash Kapoor
Rishi Bommasani
...
Victoria Westerhoff
Yacine Jernite
Rumman Chowdhury
Percy Liang
Arvind Narayanan
ELM
95
1
0
21 Mar 2025
A Review on Large Language Models for Visual Analytics
A Review on Large Language Models for Visual Analytics
Navya Sonal Agarwal
Sanjay Kumar Sonbhadra
110
0
0
19 Mar 2025
Prompt Sentiment: The Catalyst for LLM Change
Prompt Sentiment: The Catalyst for LLM Change
Vishal Gandhi
Sagar Gandhi
66
1
0
14 Mar 2025
Reasoning-Grounded Natural Language Explanations for Language Models
Vojtech Cahlik
Rodrigo Alves
Pavel Kordík
LRM
96
2
0
14 Mar 2025
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
Social Bias Benchmark for Generation: A Comparison of Generation and QA-Based Evaluations
Jiho Jin
Woosung Kang
Junho Myung
Alice Oh
72
0
0
10 Mar 2025
Uncovering Gaps in How Humans and LLMs Interpret Subjective Language
Erik Jones
Arjun Patrawala
Jacob Steinhardt
72
1
0
06 Mar 2025
Forecasting Rare Language Model Behaviors
Erik Jones
Meg Tong
Jesse Mu
Mohammed Mahfoud
Jan Leike
Roger C. Grosse
Jared Kaplan
William Fithian
Ethan Perez
Mrinank Sharma
97
1
0
24 Feb 2025
A Systematic Review of Open Datasets Used in Text-to-Image (T2I) Gen AI Model Safety
Rakeen Rouf
Trupti Bavalatti
Osama Ahmed
Dhaval Potdar
Faraz Jawed
EGVM
128
2
0
23 Feb 2025
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Glimpse: Enabling White-Box Methods to Use Proprietary Models for Zero-Shot LLM-Generated Text Detection
Guangsheng Bao
Yanbin Zhao
Juncai He
Yue Zhang
VLM
165
3
0
20 Feb 2025
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Equilibrate RLHF: Towards Balancing Helpfulness-Safety Trade-off in Large Language Models
Yingshui Tan
Yilei Jiang
Yongbin Li
Qingbin Liu
Xingyuan Bu
Wenbo Su
Xiangyu Yue
Xiaoyong Zhu
Bo Zheng
ALM
153
6
0
17 Feb 2025
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
TinyEmo: Scaling down Emotional Reasoning via Metric Projection
Cristian Gutierrez
LRM
265
0
0
17 Feb 2025
AI Mimicry and Human Dignity: Chatbot Use as a Violation of Self-Respect
Jan-Willem van der Rijt
Dimitri Coelho Mollo
Bram Vaassen
SILM
85
0
0
17 Feb 2025
From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems
From Hazard Identification to Controller Design: Proactive and LLM-Supported Safety Engineering for ML-Powered Systems
Yining Hong
Christopher S. Timperley
Christian Kastner
194
2
0
11 Feb 2025
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
"I am bad": Interpreting Stealthy, Universal and Robust Audio Jailbreaks in Audio-Language Models
Isha Gupta
David Khachaturov
Robert D. Mullins
AAMLAuLLM
115
4
0
02 Feb 2025
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
The Pitfalls of "Security by Obscurity" And What They Mean for Transparent AI
Peter Hall
Olivia Mundahl
Sunoo Park
144
0
0
30 Jan 2025
Continually Evolved Multimodal Foundation Models for Cancer Prognosis
Continually Evolved Multimodal Foundation Models for Cancer Prognosis
Jie Peng
Shuang Zhou
Longwei Yang
Yiran Song
Mohan Zhang
Kaixiong Zhou
Feng Xie
Mingquan Lin
Rui Zhang
Tianlong Chen
209
0
0
30 Jan 2025
Advanced Real-Time Fraud Detection Using RAG-Based LLMs
Gurjot Singh
Prabhjot Singh
Maninder Singh
73
1
0
28 Jan 2025
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
BLoB: Bayesian Low-Rank Adaptation by Backpropagation for Large Language Models
Yibin Wang
Haizhou Shi
Ligong Han
Dimitris N. Metaxas
Hao Wang
BDLUQLM
229
13
0
28 Jan 2025
Episodic memory in AI agents poses risks that should be studied and mitigated
Episodic memory in AI agents poses risks that should be studied and mitigated
Chad DeChant
141
4
0
20 Jan 2025
Two Types of AI Existential Risk: Decisive and Accumulative
Two Types of AI Existential Risk: Decisive and Accumulative
Atoosa Kasirzadeh
146
18
0
20 Jan 2025
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis
Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis
Lanling Xu
Junjie Zhang
Bingqian Li
Jinpeng Wang
Sheng Chen
Wayne Xin Zhao
Ji-Rong Wen
183
18
0
17 Jan 2025
Lessons From Red Teaming 100 Generative AI Products
Lessons From Red Teaming 100 Generative AI Products
Blake Bullwinkel
Amanda Minnich
Shiven Chawla
Gary Lopez
Martin Pouliot
...
Pete Bryan
Ram Shankar Siva Kumar
Yonatan Zunger
Chang Kawaguchi
Mark Russinovich
AAMLVLM
84
7
0
13 Jan 2025
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Utility-inspired Reward Transformations Improve Reinforcement Learning Training of Language Models
Roberto-Rafael Maura-Rivero
Chirag Nagpal
Roma Patel
Francesco Visin
129
1
0
08 Jan 2025
Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens
Toward Inclusive Educational AI: Auditing Frontier LLMs through a Multiplexity Lens
Abdullah Mushtaq
Muhammad Rafay Naeem
Muhammad Imran Taj
Ibrahim Ghaznavi
Junaid Qadir
95
3
0
08 Jan 2025
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
LLM Content Moderation and User Satisfaction: Evidence from Response Refusals in Chatbot Arena
Stefan Pasch
167
0
0
04 Jan 2025
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
An Overview and Discussion on Using Large Language Models for Implementation Generation of Solutions to Open-Ended Problems
Hashmath Shaik
Alex Doboli
OffRLELM
467
0
0
31 Dec 2024
From Hallucinations to Facts: Enhancing Language Models with Curated
  Knowledge Graphs
From Hallucinations to Facts: Enhancing Language Models with Curated Knowledge Graphs
Ratnesh Kumar Joshi
Sagnik Sengupta
Asif Ekbal
HILMKELM
79
0
0
24 Dec 2024
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard
  of Safety and Capability
Libra-Leaderboard: Towards Responsible AI through a Balanced Leaderboard of Safety and Capability
Haoyang Li
Xudong Han
Zenan Zhai
Honglin Mu
Hao Wang
...
Eduard H. Hovy
Iryna Gurevych
Preslav Nakov
Monojit Choudhury
Timothy Baldwin
ALM
71
3
0
24 Dec 2024
Chained Tuning Leads to Biased Forgetting
Chained Tuning Leads to Biased Forgetting
Megan Ung
Alicia Sun
Samuel J. Bell
Bhaktipriya Radharapu
Levent Sagun
Adina Williams
CLLKELM
171
0
0
21 Dec 2024
Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models
Social Science Is Necessary for Operationalizing Socially Responsible Foundation Models
Adam Davies
Elisa Nguyen
Michael Simeone
Erik Johnston
Martin Gubri
208
0
0
20 Dec 2024
Clio: Privacy-Preserving Insights into Real-World AI Use
Clio: Privacy-Preserving Insights into Real-World AI Use
Alex Tamkin
Miles McCain
Kunal Handa
Esin Durmus
Liane Lovitt
...
Wes Mitchell
Shan Carter
Jack Clark
Jared Kaplan
Deep Ganguli
161
21
0
18 Dec 2024
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
PrisonBreak: Jailbreaking Large Language Models with Fewer Than Twenty-Five Targeted Bit-flips
Zachary Coalson
Jeonghyun Woo
Shiyang Chen
Yu Sun
Lishan Yang
Prashant J. Nair
Bo Fang
Sanghyun Hong
AAML
136
3
0
10 Dec 2024
SafeWorld: Geo-Diverse Safety Alignment
SafeWorld: Geo-Diverse Safety Alignment
Da Yin
Haoyi Qiu
Kung-Hsiang Huang
Kai-Wei Chang
Nanyun Peng
121
8
0
09 Dec 2024
How far can bias go? -- Tracing bias from pretraining data to alignment
How far can bias go? -- Tracing bias from pretraining data to alignment
Marion Thaler
Abdullatif Köksal
Alina Leidinger
Anna Korhonen
Hinrich Schutze
153
1
0
28 Nov 2024
Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems
Ex Uno Pluria: Insights on Ensembling in Low Precision Number Systems
G. Nam
Juho Lee
133
0
0
22 Nov 2024
Exploring Accuracy-Fairness Trade-off in Large Language Models
Exploring Accuracy-Fairness Trade-off in Large Language Models
Qingquan Zhang
Qiqi Duan
Bo Yuan
Yuhui Shi
Qingbin Liu
115
0
0
21 Nov 2024
The Dark Side of Trust: Authority Citation-Driven Jailbreak Attacks on Large Language Models
Xikang Yang
Xuehai Tang
Jizhong Han
Songlin Hu
116
0
0
18 Nov 2024
SPICA: Retrieving Scenarios for Pluralistic In-Context Alignment
Quan Ze Chen
K. J. Kevin Feng
Chan Young Park
Amy X. Zhang
63
0
0
16 Nov 2024
Approximated Variational Bayesian Inverse Reinforcement Learning for
  Large Language Model Alignment
Approximated Variational Bayesian Inverse Reinforcement Learning for Large Language Model Alignment
Yuang Cai
Yuyu Yuan
Jinsheng Shi
Qinhong Lin
73
0
0
14 Nov 2024
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics
  Statements
Ethical Concern Identification in NLP: A Corpus of ACL Anthology Ethics Statements
Antonia Karamolegkou
Sandrine Schiller Hansen
Ariadni Christopoulou
Filippos Stamatiou
Anne Lauscher
Anders Søgaard
54
0
0
12 Nov 2024
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering
  in LLMs
SCAR: Sparse Conditioned Autoencoders for Concept Detection and Steering in LLMs
Ruben Härle
Felix Friedrich
Manuel Brack
Bjorn Deiseroth
P. Schramowski
Kristian Kersting
71
2
0
11 Nov 2024
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for
  Improved Prompt Engineering
CoPrompter: User-Centric Evaluation of LLM Instruction Alignment for Improved Prompt Engineering
Ishika Joshi
Simra Shahid
Shreeya Venneti
Manushree Vasu
Yantao Zheng
Yunyao Li
Balaji Krishnamurthy
Gromit Yeuk-Yin Chan
93
4
0
09 Nov 2024
The Dark Patterns of Personalized Persuasion in Large Language Models:
  Exposing Persuasive Linguistic Features for Big Five Personality Traits in
  LLMs Responses
The Dark Patterns of Personalized Persuasion in Large Language Models: Exposing Persuasive Linguistic Features for Big Five Personality Traits in LLMs Responses
Wiktoria Mieleszczenko-Kowszewicz
Dawid Płudowski
Filip Kołodziejczyk
Jakub Świstak
Julian Sienkiewicz
P. Biecek
109
3
0
08 Nov 2024
"I Always Felt that Something Was Wrong.": Understanding Compliance
  Risks and Mitigation Strategies when Professionals Use Large Language Models
"I Always Felt that Something Was Wrong.": Understanding Compliance Risks and Mitigation Strategies when Professionals Use Large Language Models
Siying Hu
Piaohong Wang
Yaxing Yao
Zhicong Lu
AILawPILM
83
1
0
07 Nov 2024
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
Evaluating Moral Beliefs across LLMs through a Pluralistic Framework
Xuelin Liu
Yanfei Zhu
Shucheng Zhu
Pengyuan Liu
Ying Liu
Dong Yu
65
4
0
06 Nov 2024
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
SPES: Spectrogram Perturbation for Explainable Speech-to-Text Generation
Dennis Fucci
Marco Gaido
Beatrice Savoldi
Matteo Negri
Mauro Cettolo
L. Bentivogli
270
3
0
03 Nov 2024
Identifying Implicit Social Biases in Vision-Language Models
Identifying Implicit Social Biases in Vision-Language Models
Kimia Hamidieh
Haoran Zhang
Walter Gerych
Thomas Hartvigsen
Marzyeh Ghassemi
VLM
94
16
0
01 Nov 2024
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Can LLMs make trade-offs involving stipulated pain and pleasure states?
Geoff Keeling
Winnie Street
Martyna Stachaczyk
Daria Zakharova
Iulia M. Comsa
Anastasiya Sakovych
Isabella Logothesis
Zejia Zhang
Blaise Agüera y Arcas
Jonathan Birch
87
5
0
01 Nov 2024
Generative AI for Accessible and Inclusive Extended Reality
Generative AI for Accessible and Inclusive Extended Reality
Jens Grubert
Junlong Chen
Per Ola Kristensson
74
2
0
31 Oct 2024
Smaller Large Language Models Can Do Moral Self-Correction
Smaller Large Language Models Can Do Moral Self-Correction
Guangliang Liu
Zhiyu Xue
Rongrong Wang
K. Johnson
Kristen Marie Johnson
LRM
98
0
0
30 Oct 2024
Previous
12345...111213
Next