ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04359
  4. Cited By
Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
    PILM
ArXiv (abs)PDFHTML

Papers citing "Ethical and social risks of harm from Language Models"

50 / 634 papers shown
Title
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
RE-IMAGINE: Symbolic Benchmark Synthesis for Reasoning Evaluation
Xinnuo Xu
Rachel Lawrence
Kshitij Dubey
Atharva Pandey
Risa Ueno
Fabian Falck
A. Nori
Rahul Sharma
Amit Sharma
Javier González
LRM
17
0
0
18 Jun 2025
Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
Deflating Deflationism: A Critical Perspective on Debunking Arguments Against LLM Mentality
Alex Grzankowski
Geoff Keeling
Henry Shevlin
Winnie Street
11
0
0
16 Jun 2025
Reinforcement Learning from Human Feedback with High-Confidence Safety Constraints
Yaswanth Chittepu
Blossom Metevier
Will Schwarzer
Austin Hoag
S. Niekum
Philip S Thomas
20
0
0
09 Jun 2025
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
Joint-GCG: Unified Gradient-Based Poisoning Attacks on Retrieval-Augmented Generation Systems
Haowei Wang
Rupeng Zhang
Junjie Wang
Mingyang Li
Yuekai Huang
Dandan Wang
Qing Wang
SILMAAML
46
0
0
06 Jun 2025
Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society
Scenarios in Computing Research: A Systematic Review of the Use of Scenario Methods for Exploring the Future of Computing Technologies in Society
Julia Barnett
Kimon Kieslich
J. Sinchai
Nicholas Diakopoulos
AI4TS
30
0
0
05 Jun 2025
Do Language Models Think Consistently? A Study of Value Preferences Across Varying Response Lengths
Do Language Models Think Consistently? A Study of Value Preferences Across Varying Response Lengths
Inderjeet Nair
Lu Wang
47
0
0
03 Jun 2025
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
Deontological Keyword Bias: The Impact of Modal Expressions on Normative Judgments of Language Models
Bumjin Park
Jinsil Lee
Jaesik Choi
13
0
0
01 Jun 2025
HADA: Human-AI Agent Decision Alignment Architecture
Tapio Pitkäranta
Leena Pitkäranta
17
0
0
01 Jun 2025
COSMIC: Generalized Refusal Direction Identification in LLM Activations
COSMIC: Generalized Refusal Direction Identification in LLM Activations
Vincent Siu
Nicholas Crispino
Zihao Yu
Sam Pan
Zhun Wang
Yang Liu
Dawn Song
Chenguang Wang
LLMSV
25
0
0
30 May 2025
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations
Whose Name Comes Up? Auditing LLM-Based Scholar Recommendations
Daniele Barolo
Chiara Valentin
Fariba Karimi
Luis Galárraga
Gonzalo G. Méndez
Lisette Espín-Noboa
20
0
0
29 May 2025
Probing Politico-Economic Bias in Multilingual Large Language Models: A Cultural Analysis of Low-Resource Pakistani Languages
Probing Politico-Economic Bias in Multilingual Large Language Models: A Cultural Analysis of Low-Resource Pakistani Languages
Afrozah Nadeem
Mark Dras
Usman Naseem
17
0
0
29 May 2025
LLM Agents for Bargaining with Utility-based Feedback
LLM Agents for Bargaining with Utility-based Feedback
Jihwan Oh
Murad Aghazada
Se-Young Yun
Taehyeon Kim
LLMAG
21
0
0
29 May 2025
The Multilingual Divide and Its Impact on Global AI Safety
The Multilingual Divide and Its Impact on Global AI Safety
Aidan Peppin
Julia Kreutzer
Alice Schoenauer Sebag
Kelly Marchisio
Beyza Ermis
...
Wei-Yin Ko
Ahmet Üstün
Matthias Gallé
Marzieh Fadaee
Sara Hooker
ELM
75
1
0
27 May 2025
Language Models Surface the Unwritten Code of Science and Society
Language Models Surface the Unwritten Code of Science and Society
Honglin Bao
Siyang Wu
Jiwoong Choi
Yingrong Mao
James A. Evans
50
0
0
25 May 2025
Does Chain-of-Thought Reasoning Really Reduce Harmfulness from Jailbreaking?
Chengda Lu
Xiaoyu Fan
Yu Huang
Rongwu Xu
Jijie Li
Wei Xu
LRM
66
0
0
23 May 2025
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability
Accidental Misalignment: Fine-Tuning Language Models Induces Unexpected Vulnerability
Punya Syon Pandey
Samuel Simko
Kellin Pelrine
Zhijing Jin
AAML
52
0
0
22 May 2025
Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification
Breaking mBad! Supervised Fine-tuning for Cross-Lingual Detoxification
Himanshu Beniwal
Y. Kim
Maarten Sap
Soham Dan
Thomas Hartvigsen
CLL
85
0
0
22 May 2025
After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG
After Retrieval, Before Generation: Enhancing the Trustworthiness of Large Language Models in RAG
Xinbang Dai
Huikang Hu
Yuncheng Hua
Jiaqi Li
Yongrui Chen
Rihui Jin
Nan Hu
Guilin Qi
RALM3DV
67
0
0
21 May 2025
A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents
A Risk Taxonomy for Evaluating AI-Powered Psychotherapy Agents
Ian Steenstra
Timothy W. Bickmore
58
0
0
21 May 2025
Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification
Unraveling Interwoven Roles of Large Language Models in Authorship Privacy: Obfuscation, Mimicking, and Verification
Tuc Nguyen
Yifan Hu
Thai Le
DeLMO
83
0
0
20 May 2025
Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations
Attributional Safety Failures in Large Language Models under Code-Mixed Perturbations
Somnath Banerjee
Pratyush Chatterjee
Shanu Kumar
Sayan Layek
Parag Agrawal
Rima Hazra
Animesh Mukherjee
AAML
195
0
0
20 May 2025
Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models
Trust Me, I Can Handle It: Self-Generated Adversarial Scenario Extrapolation for Robust Language Models
Md Rafi Ur Rashid
Vishnu Asutosh Dasu
Ye Wang
Gang Tan
Shagufta Mehnaz
AAMLELM
109
0
0
20 May 2025
Pairwise Calibrated Rewards for Pluralistic Alignment
Pairwise Calibrated Rewards for Pluralistic Alignment
Daniel Halpern
Evi Micha
Ariel D. Procaccia
Itai Shapira
18
0
0
17 May 2025
Creating General User Models from Computer Use
Creating General User Models from Computer Use
Omar Shaikh
Shardul Sapkota
Shan Rizvi
Eric Horvitz
Joon Sung Park
Diyi Yang
Michael S. Bernstein
HAI
134
0
0
16 May 2025
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Reproducibility Study of "Cooperate or Collapse: Emergence of Sustainable Cooperation in a Society of LLM Agents"
Pedro M. P. Curvo
Mara Dragomir
Salvador Torpes
Mohammadmahdi Rahimi
LLMAG
107
0
0
14 May 2025
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
SecReEvalBench: A Multi-turned Security Resilience Evaluation Benchmark for Large Language Models
Huining Cui
Wei Liu
AAMLELM
116
0
0
12 May 2025
Real-World Gaps in AI Governance Research
Real-World Gaps in AI Governance Research
Ilan Strauss
Isobel Moure
Tim O'Reilly
Sruly Rosenblat
156
1
0
30 Apr 2025
$\texttt{SAGE}$: A Generic Framework for LLM Safety Evaluation
SAGE\texttt{SAGE}SAGE: A Generic Framework for LLM Safety Evaluation
Madhur Jindal
Hari Shrawgi
Parag Agrawal
Sandipan Dandapat
ELM
86
0
0
28 Apr 2025
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Adaptive Helpfulness-Harmlessness Alignment with Preference Vectors
Ren-Wei Liang
Chin-Ting Hsu
Chan-Hung Yu
Saransh Agrawal
Shih-Cheng Huang
Shang-Tse Chen
Kuan-Hao Huang
Shao-Hua Sun
167
0
0
27 Apr 2025
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
AI Ethics and Social Norms: Exploring ChatGPT's Capabilities From What to How
Omid Veisi
Sasan Bahrami
Roman Englert
Claudia Müller
330
0
0
25 Apr 2025
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Evaluating and Mitigating Bias in AI-Based Medical Text Generation
Xiuying Chen
Tairan Wang
Juexiao Zhou
Zirui Song
Xin Gao
Wei Wei
MedIm
79
3
0
24 Apr 2025
(Im)possibility of Automated Hallucination Detection in Large Language Models
(Im)possibility of Automated Hallucination Detection in Large Language Models
Amin Karbasi
Omar Montasser
John Sous
Grigoris Velegkas
HILM
101
0
0
23 Apr 2025
aiXamine: Simplified LLM Safety and Security
aiXamine: Simplified LLM Safety and Security
Fatih Deniz
Dorde Popovic
Yazan Boshmaf
Euisuh Jeong
M. Ahmad
Sanjay Chawla
Issa M. Khalil
ELM
337
0
0
21 Apr 2025
Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work
Exploring Collaborative GenAI Agents in Synchronous Group Settings: Eliciting Team Perceptions and Design Considerations for the Future of Work
Janet G. Johnson
Macarena Peralta
Mansanjam Kaur
Ruijie Sophia Huang
Sheng Zhao
Ruijia Guan
Shwetha Rajaram
Michael Nebeling
LLMAG
94
0
0
21 Apr 2025
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Jailbreak Detection in Clinical Training LLMs Using Feature-Based Predictive Models
Tri Nguyen
Lohith Srikanth Pentapalli
Magnus Sieverding
Laurah Turner
Seth Overla
...
Michael Gharib
Matt Kelleher
Michael Shukis
Cameron Pawlik
Kelly Cohen
101
0
0
21 Apr 2025
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages
Mind the Language Gap: Automated and Augmented Evaluation of Bias in LLMs for High- and Low-Resource Languages
Alessio Buscemi
Cedric Lothritz
Sergio Morales
Marcos Gomez-Vazquez
Robert Clarisó
Jordi Cabot
German Castignani
57
0
0
19 Apr 2025
Demo: ViolentUTF as An Accessible Platform for Generative AI Red Teaming
Demo: ViolentUTF as An Accessible Platform for Generative AI Red Teaming
Tam n. Nguyen
103
0
0
14 Apr 2025
Feature-Aware Malicious Output Detection and Mitigation
Feature-Aware Malicious Output Detection and Mitigation
Weilong Dong
Peiguang Li
Yu Tian
Xinyi Zeng
Fengdi Li
Sirui Wang
AAML
47
0
0
12 Apr 2025
Open Problems and a Hypothetical Path Forward in LLM Knowledge Paradigms
Open Problems and a Hypothetical Path Forward in LLM Knowledge Paradigms
Xiaotian Ye
Hao Fei
Shu Wu
KELMELM
133
0
0
09 Apr 2025
NLP Security and Ethics, in the Wild
NLP Security and Ethics, in the Wild
Heather Lent
Erick Galinkin
Yiyi Chen
Jens Myrup Pedersen
Leon Derczynski
Johannes Bjerva
SILM
135
0
0
09 Apr 2025
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Navigating the Rabbit Hole: Emergent Biases in LLM-Generated Attack Narratives Targeting Mental Health Groups
Rijul Magu
Arka Dutta
Sean Kim
Ashiqur R. KhudaBukhsh
Munmun De Choudhury
110
0
0
08 Apr 2025
A Survey of Social Cybersecurity: Techniques for Attack Detection, Evaluations, Challenges, and Future Prospects
A Survey of Social Cybersecurity: Techniques for Attack Detection, Evaluations, Challenges, and Future Prospects
Aos Mulahuwaish
Basheer Qolomany
Kevin Gyorick
Jacques Bou Abdo
Mohammed Aledhari
Junaid Qadir
Kathleen Carley
Ala I. Al-Fuqaha
56
1
0
06 Apr 2025
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Locations of Characters in Narratives: Andersen and Persuasion Datasets
Batuhan Ozyurt
Roya Arkhmammadova
Deniz Yuret
75
2
0
04 Apr 2025
Increasing happiness through conversations with artificial intelligence
Increasing happiness through conversations with artificial intelligence
Joseph Heffner
Chongyu Qin
Martin Chadwick
Chris Knutsen
Christopher Summerfield
Zeb Kurth-Nelson
Robb B. Rutledge
AI4MH
86
0
0
02 Apr 2025
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions
Encrypted Prompt: Securing LLM Applications Against Unauthorized Actions
Shih-Han Chan
AAML
83
1
0
29 Mar 2025
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
FLEX: A Benchmark for Evaluating Robustness of Fairness in Large Language Models
Dahyun Jung
Seungyoon Lee
Hyeonseok Moon
Chanjun Park
Heuiseok Lim
AAMLALMELM
106
3
0
25 Mar 2025
Gemma 3 Technical Report
Gemma 3 Technical Report
Gemma Team
Aishwarya B Kamath
Johan Ferret
Shreya Pathak
Nino Vieillard
...
Harshal Tushar Lehri
Hussein Hazimeh
Ian Ballantyne
Idan Szpektor
Ivan Nardini
VLM
193
136
0
25 Mar 2025
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts
Understanding the Effects of RLHF on the Quality and Detectability of LLM-Generated Texts
Beining Xu
Arkaitz Zubiaga
DeLMO
119
0
0
23 Mar 2025
AgentRxiv: Towards Collaborative Autonomous Research
AgentRxiv: Towards Collaborative Autonomous Research
Samuel Schmidgall
Michael Moor
181
8
0
23 Mar 2025
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Improving Preference Extraction In LLMs By Identifying Latent Knowledge Through Classifying Probes
Sharan Maiya
Yinhong Liu
Ramit Debnath
Anna Korhonen
74
0
0
22 Mar 2025
1234...111213
Next