ResearchTrend.AI
  • Papers
  • Communities
  • Events
  • Blog
  • Pricing
Papers
Communities
Social Events
Terms and Conditions
Pricing
Parameter LabParameter LabTwitterGitHubLinkedInBlueskyYoutube

© 2025 ResearchTrend.AI, All rights reserved.

  1. Home
  2. Papers
  3. 2112.04359
  4. Cited By
Ethical and social risks of harm from Language Models

Ethical and social risks of harm from Language Models

8 December 2021
Laura Weidinger
John F. J. Mellor
Maribeth Rauh
Conor Griffin
J. Uesato
Po-Sen Huang
Myra Cheng
Mia Glaese
Borja Balle
Atoosa Kasirzadeh
Zachary Kenton
S. Brown
Will Hawkins
T. Stepleton
Courtney Biles
Abeba Birhane
Julia Haas
Laura Rimell
Lisa Anne Hendricks
William S. Isaac
Sean Legassick
G. Irving
Iason Gabriel
    PILM
ArXiv (abs)PDFHTML

Papers citing "Ethical and social risks of harm from Language Models"

50 / 634 papers shown
Title
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
CFSafety: Comprehensive Fine-grained Safety Assessment for LLMs
Zhihao Liu
Chenhui Hu
ALMELM
73
1
0
29 Oct 2024
The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic
  Behaviors in Human-AI Relationships
The Dark Side of AI Companionship: A Taxonomy of Harmful Algorithmic Behaviors in Human-AI Relationships
Renwen Zhang
Han Li
Han Meng
Jinyuan Zhan
Hongyuan Gan
Yi-Chieh Lee
68
0
0
26 Oct 2024
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
SWITCH: Studying with Teacher for Knowledge Distillation of Large Language Models
Jahyun Koo
Yerin Hwang
Yongil Kim
Taegwan Kang
Hyunkyung Bae
Kyomin Jung
126
0
0
25 Oct 2024
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Enhancing Safety in Reinforcement Learning with Human Feedback via Rectified Policy Optimization
Xiyue Peng
Hengquan Guo
Jiawei Zhang
Dongqing Zou
Ziyu Shao
Honghao Wei
Xin Liu
122
3
0
25 Oct 2024
Adversarial Attacks on Large Language Models Using Regularized
  Relaxation
Adversarial Attacks on Large Language Models Using Regularized Relaxation
Samuel Jacob Chacko
Sajib Biswas
Chashi Mahiul Islam
Fatema Tabassum Liza
Xiuwen Liu
AAML
82
3
0
24 Oct 2024
Insights on Disagreement Patterns in Multimodal Safety Perception across
  Diverse Rater Groups
Insights on Disagreement Patterns in Multimodal Safety Perception across Diverse Rater Groups
Charvi Rastogi
Tian Huey Teh
Pushkar Mishra
Roma Patel
Zoe C. Ashwood
...
Alicia Parrish
Ding Wang
Vinodkumar Prabhakaran
Lora Aroyo
Verena Rieser
EGVM
57
2
0
22 Oct 2024
Voice-Enabled AI Agents can Perform Common Scams
Voice-Enabled AI Agents can Perform Common Scams
Richard Fang
Dylan Bowman
Daniel Kang
68
2
0
21 Oct 2024
Boardwalk Empire: How Generative AI is Revolutionizing Economic
  Paradigms
Boardwalk Empire: How Generative AI is Revolutionizing Economic Paradigms
Subramanyam Sahoo
Kamlesh Dutta
101
2
0
19 Oct 2024
Adanonymizer: Interactively Navigating and Balancing the Duality of
  Privacy and Output Performance in Human-LLM Interaction
Adanonymizer: Interactively Navigating and Balancing the Duality of Privacy and Output Performance in Human-LLM Interaction
Shuning Zhang
Xin Yi
Haobin Xing
Lyumanshan Ye
Yongquan Hu
Hewu Li
75
2
0
19 Oct 2024
"Ghost of the past": identifying and resolving privacy leakage from
  LLM's memory through proactive user interaction
"Ghost of the past": identifying and resolving privacy leakage from LLM's memory through proactive user interaction
Shuning Zhang
Lyumanshan Ye
Xin Yi
Jingyu Tang
Bo Shui
Haobin Xing
Pengfei Liu
Hewu Li
106
5
0
19 Oct 2024
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language
  Models
Ethics Whitepaper: Whitepaper on Ethical Research into Large Language Models
Eddie L. Ungless
Nikolas Vitsakis
Zeerak Talat
James Garforth
Bjorn Ross
Arno Onken
Atoosa Kasirzadeh
Alexandra Birch
71
1
0
17 Oct 2024
Sound Check: Auditing Audio Datasets
Sound Check: Auditing Audio Datasets
William Agnew
Julia Barnett
Annie Chu
Rachel Hong
Michael Feffer
Robin Netzorg
Harry H. Jiang
Ezra Awumey
Sauvik Das
120
1
0
17 Oct 2024
WorldMedQA-V: a multilingual, multimodal medical examination dataset for
  multimodal language models evaluation
WorldMedQA-V: a multilingual, multimodal medical examination dataset for multimodal language models evaluation
João Matos
Shan Chen
Siena Placino
Yingya Li
Juan Carlos Climent Pardo
...
Hugo J. W. L. Aerts
Leo Anthony Celi
A. I. Wong
Danielle S. Bitterman
Jack Gallifant
61
2
0
16 Oct 2024
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
BenchmarkCards: Standardized Documentation for Large Language Model Benchmarks
Anna Sokol
Elizabeth M. Daly
Michael Hind
David Piorkowski
Xiangliang Zhang
Nuno Moniz
Nitesh Chawla
78
0
0
16 Oct 2024
Can LLMs be Scammed? A Baseline Measurement Study
Can LLMs be Scammed? A Baseline Measurement Study
Udari Madhushani Sehwag
Kelly Patel
Francesca Mosca
Vineeth Ravi
Jessica Staddon
36
0
0
14 Oct 2024
Generation with Dynamic Vocabulary
Generation with Dynamic Vocabulary
Yanting Liu
Tao Ji
Changzhi Sun
Yuanbin Wu
Xiaoling Wang
77
1
0
11 Oct 2024
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
COMPL-AI Framework: A Technical Interpretation and LLM Benchmarking Suite for the EU Artificial Intelligence Act
Philipp Guldimann
Alexander Spiridonov
Robin Staab
Nikola Jovanović
Mark Vero
...
Mislav Balunović
Nikola Konstantinov
Pavol Bielik
Petar Tsankov
Martin Vechev
ELM
101
8
0
10 Oct 2024
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Root Defence Strategies: Ensuring Safety of LLM at the Decoding Level
Xinyi Zeng
Yuying Shang
Yutao Zhu
Jingyuan Zhang
Yu Tian
AAML
493
4
0
09 Oct 2024
The Role of Governments in Increasing Interconnected Post-Deployment
  Monitoring of AI
The Role of Governments in Increasing Interconnected Post-Deployment Monitoring of AI
Merlin Stein
Jamie Bernardi
Connor Dunlop
81
6
0
07 Oct 2024
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Taylor Unswift: Secured Weight Release for Large Language Models via Taylor Expansion
Guanchu Wang
Yu-Neng Chuang
Ruixiang Tang
Shaochen Zhong
Jiayi Yuan
...
Zirui Liu
Vipin Chaudhary
Shuai Xu
James Caverlee
Helen Zhou
PILM
165
2
0
06 Oct 2024
From Pixels to Personas: Investigating and Modeling
  Self-Anthropomorphism in Human-Robot Dialogues
From Pixels to Personas: Investigating and Modeling Self-Anthropomorphism in Human-Robot Dialogues
Yu Li
Devamanyu Hazarika
Di Jin
Julia Hirschberg
Yang Liu
65
1
0
04 Oct 2024
Examining the Role of Relationship Alignment in Large Language Models
Examining the Role of Relationship Alignment in Large Language Models
Kristen M. Altenburger
Hongda Jiang
Robert E. Kraut
Yi-Chia Wang
Jane Dwivedi-Yu
75
0
0
02 Oct 2024
Moral Alignment for LLM Agents
Moral Alignment for LLM Agents
Elizaveta Tennant
Stephen Hailes
Mirco Musolesi
126
8
0
02 Oct 2024
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
Conformal Generative Modeling with Improved Sample Efficiency through Sequential Greedy Filtering
Kemal Kurniawan
Bernhard Schölkopf
Michael Muehlebach
197
1
0
02 Oct 2024
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
RED QUEEN: Safeguarding Large Language Models against Concealed Multi-Turn Jailbreaking
Yifan Jiang
Kriti Aggarwal
Tanmay Laud
Kashif Munir
Jay Pujara
Subhabrata Mukherjee
AAML
116
13
0
26 Sep 2024
Textoshop: Interactions Inspired by Drawing Software to Facilitate Text
  Editing
Textoshop: Interactions Inspired by Drawing Software to Facilitate Text Editing
Damien Masson
Young-Ho Kim
Fanny Chevalier
74
5
0
25 Sep 2024
LLM Echo Chamber: personalized and automated disinformation
LLM Echo Chamber: personalized and automated disinformation
Tony Ma
46
1
0
24 Sep 2024
HelloBench: Evaluating Long Text Generation Capabilities of Large
  Language Models
HelloBench: Evaluating Long Text Generation Capabilities of Large Language Models
Haoran Que
Feiyu Duan
Liqun He
Yutao Mou
Wangchunshu Zhou
...
Ge Zhang
Junran Peng
Zhaoxiang Zhang
Songyang Zhang
Kai Chen
LM&MAELMVLM
106
16
0
24 Sep 2024
'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi
  Language Generation by LLMs
'Since Lawyers are Males..': Examining Implicit Gender Bias in Hindi Language Generation by LLMs
Ishika Joshi
Ishita Gupta
Adrita Dey
Tapan Parikh
AI4CE
67
2
0
20 Sep 2024
Local Explanations and Self-Explanations for Assessing Faithfulness in
  black-box LLMs
Local Explanations and Self-Explanations for Assessing Faithfulness in black-box LLMs
Christos Fragkathoulas
Odysseas S. Chlapanis
LRM
52
1
0
18 Sep 2024
What Is Wrong with My Model? Identifying Systematic Problems with
  Semantic Data Slicing
What Is Wrong with My Model? Identifying Systematic Problems with Semantic Data Slicing
Chenyang Yang
Yining Hong
Grace A. Lewis
Tongshuang Wu
Christian Kastner
75
1
0
14 Sep 2024
Content Moderation by LLM: From Accuracy to Legitimacy
Content Moderation by LLM: From Accuracy to Legitimacy
Tao Huang
AILaw
106
5
0
05 Sep 2024
The Role of Large Language Models in Musicology: Are We Ready to Trust
  the Machines?
The Role of Large Language Models in Musicology: Are We Ready to Trust the Machines?
Pedro Ramoneda
Emilia Parada-Cabaleiro
Benno Weck
Xavier Serra
31
1
0
03 Sep 2024
Differentially Private Kernel Density Estimation
Differentially Private Kernel Density Estimation
Erzhi Liu
Jerry Yao-Chieh Hu
Alex Reneau
Zhao Song
Han Liu
143
3
0
03 Sep 2024
Pairing Analogy-Augmented Generation with Procedural Memory for
  Procedural Q&A
Pairing Analogy-Augmented Generation with Procedural Memory for Procedural Q&A
K Roth
Rushil Gupta
Simon Halle
Bang Liu
RALM
64
0
0
02 Sep 2024
Conversational Complexity for Assessing Risk in Large Language Models
Conversational Complexity for Assessing Risk in Large Language Models
John Burden
Manuel Cebrian
José Hernández-Orallo
101
2
0
02 Sep 2024
Legilimens: Practical and Unified Content Moderation for Large Language
  Model Services
Legilimens: Practical and Unified Content Moderation for Large Language Model Services
Jialin Wu
Jiangyi Deng
Shengyuan Pang
Yanjiao Chen
Jiayang Xu
Xinfeng Li
Wenyuan Xu
127
8
0
28 Aug 2024
AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark
AAVENUE: Detecting LLM Biases on NLU Tasks in AAVE via a Novel Benchmark
Abhay Gupta
Philip Meng
Ece Yurtseven
Sean O'Brien
Kevin Zhu
48
8
0
27 Aug 2024
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Bi-Factorial Preference Optimization: Balancing Safety-Helpfulness in Language Models
Wenxuan Zhang
Philip Torr
Mohamed Elhoseiny
Adel Bibi
198
15
0
27 Aug 2024
EEG-Defender: Defending against Jailbreak through Early Exit Generation
  of Large Language Models
EEG-Defender: Defending against Jailbreak through Early Exit Generation of Large Language Models
Chongwen Zhao
Zhihao Dou
Kaizhu Huang
AAML
67
3
0
21 Aug 2024
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Beyond Labels: Aligning Large Language Models with Human-like Reasoning
Muhammad Rafsan Kabir
Rafeed Mohammad Sultan
Ihsanul Haque Asif
Jawad Ibn Ahad
Fuad Rahman
Mohammad Ruhul Amin
Nabeel Mohammed
Shafin Rahman
LRM
89
2
0
20 Aug 2024
REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement
  Learning
REFINE-LM: Mitigating Language Model Stereotypes via Reinforcement Learning
Rameez Qureshi
Naim Es-Sebbani
Luis Galárraga
Yvette Graham
Miguel Couceiro
Zied Bouraoui
50
1
0
18 Aug 2024
Leveraging Language Models for Emotion and Behavior Analysis in
  Education
Leveraging Language Models for Emotion and Behavior Analysis in Education
Kaito Tanaka
Benjamin Tan
Brian Wong
56
2
0
13 Aug 2024
Speculations on Uncertainty and Humane Algorithms
Speculations on Uncertainty and Humane Algorithms
Nicholas Gray
88
0
0
13 Aug 2024
Dynamic Fog Computing for Enhanced LLM Execution in Medical Applications
Dynamic Fog Computing for Enhanced LLM Execution in Medical Applications
Philipp Zagar
Vishnu Ravi
Lauren Aalami
Stephan Krusche
Oliver Aalami
Paul Schmiedmayer
61
4
0
08 Aug 2024
EXAONE 3.0 7.8B Instruction Tuned Language Model
EXAONE 3.0 7.8B Instruction Tuned Language Model
LG AI Research
:
Soyoung An
Kyunghoon Bae
Eunbi Choi
...
Boseong Seo
Sihoon Yang
Heuiyeen Yeen
Kyungjae Yoo
Hyeongu Yun
ELMALM
95
12
0
07 Aug 2024
Gemma 2: Improving Open Language Models at a Practical Size
Gemma 2: Improving Open Language Models at a Practical Size
Gemma Team
Gemma Team Morgane Riviere
Shreya Pathak
Pier Giuseppe Sessa
Cassidy Hardin
...
Noah Fiedel
Armand Joulin
Kathleen Kenealy
Robert Dadashi
Alek Andreev
VLMMoEOSLM
147
922
0
31 Jul 2024
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Safetywashing: Do AI Safety Benchmarks Actually Measure Safety Progress?
Richard Ren
Steven Basart
Adam Khoja
Alice Gatti
Long Phan
...
Alexander Pan
Gabriel Mukobi
Ryan H. Kim
Stephen Fitz
Dan Hendrycks
ELM
74
25
0
31 Jul 2024
Cluster-norm for Unsupervised Probing of Knowledge
Cluster-norm for Unsupervised Probing of Knowledge
Walter Laurito
Sharan Maiya
Grégoire Dhimoïla
Owen
Owen Yeung
Kaarel Hänni
52
3
0
26 Jul 2024
Course-Correction: Safety Alignment Using Synthetic Preferences
Course-Correction: Safety Alignment Using Synthetic Preferences
Rongwu Xu
Yishuo Cai
Zhenhong Zhou
Renjie Gu
Haiqin Weng
Yan Liu
Tianwei Zhang
Wei Xu
Han Qiu
76
7
0
23 Jul 2024
Previous
123456...111213
Next